MiniMax-M3 breaks the ceiling for open-weights models, combining a 1M token context window, native multimodality, and elite coding capabilities.

On June 1, 2026, the landscape of open-source artificial intelligence shifted fundamentally. MiniMax has officially released MiniMax-M3, a milestone model that bridges the gap between closed-source proprietary giants and the open-source community. For developers and AI engineers, this isn't just another incremental update; it is a paradigm shift in what we expect from open-weights architectures.
Historically, developers had to choose between the massive context windows of proprietary models or the flexibility and privacy of open-source models. MiniMax-M3 eliminates this compromise. By delivering frontier-level reasoning, massive context, and native multimodality in an open-weights format, MiniMax has set a new gold standard for the industry.
At the heart of MiniMax-M3 lies the proprietary MiniMax Sparse Attention (MSA) architecture. This breakthrough allows the model to handle an unprecedented 1M token context window while maintaining extreme efficiency. Unlike traditional dense attention mechanisms that scale quadratically, MSA enables the model to process massive datasets with significantly reduced computational overhead.
While the model supports a full 1M token window, MiniMax guarantees a minimum of 512K tokens of high-fidelity performance. This architecture is specifically optimized to solve the 'latency killer' in agentic loopsβthe massive re-prefilling time required when an agent makes repeated tool calls within a growing context. With MSA, prefilling speeds are optimized to keep agentic workflows fluid and responsive.
MiniMax-M3 isn't just large; it's incredibly capable. In benchmark testing, the model has demonstrated a level of reasoning and coding proficiency that rivals the most advanced closed models. Most notably, on the BrowseComp benchmark, M3 achieved a score of 83.5, decisively surpassing the industry-leading Opus 4.7, which scored 79.3.
The model's strength lies in its autonomous task decomposition and multi-step reasoning. In complex coding environments, M3 excels at understanding entire repositories, identifying bugs across multiple files, and suggesting structural refactors. It is the first open model to simultaneously achieve frontier coding capabilities, a million-token context window, and native multimodal support.
MiniMax has introduced a tiered pricing structure that rewards efficiency, particularly for developers utilizing prompt caching. The pricing is split based on the context length, ensuring that users paying for massive 1M token windows are billed accurately for the scale of their workloads.
For standard workloads under 512k tokens, the costs remain highly competitive. For those pushing the limits of the 1M token window, the pricing scales accordingly. The inclusion of prompt caching at a fraction of the standard input cost makes M3 an ideal candidate for RAG-heavy applications and long-running agentic loops where context is frequently reused.
The versatility of MiniMax-M3 makes it a Swiss Army knife for modern AI engineering. Its massive context window makes it a premier choice for advanced Retrieval-Augmented Generation (RAG), where entire document libraries can be ingested without losing nuance. For software engineers, it serves as a highly capable pair programmer capable of reasoning through massive codebases.
Furthermore, its native multimodality and agentic reasoning make it perfect for building autonomous agents. These agents can 'see' through multimodal inputs, 'think' through complex task decomposition, and 'act' by using tools to interact with the real world or digital environments. Whether you are building a complex research assistant or an automated DevOps agent, M3 provides the necessary foundation.
Developers can start integrating MiniMax-M3 immediately via the MiniMax API platform. The model is accessible through standard REST endpoints and is supported by official SDKs designed for rapid deployment. Given its open-weights nature, the community is also expected to release quantized versions for local deployment on high-end hardware.
To begin, visit the MiniMax developer portal to generate your API key and explore the documentation for specific implementation details regarding the MSA architecture and multimodal input formats.
API Pricing β Input: $0.60 / M tokens (β€ 512k), $1.20 / M tokens (> 512k) / Output: $2.40 / M tokens (β€ 512k), $4.80 / M tokens (> 512k) / Context: 1M tokens