Xiaomi MiMo V2 Flash: The Open-Source Reasoning Powerhouse
Xiaomi releases MiMo V2 Flash, a 309B MoE reasoning model optimized for math and code, challenging GPT-5 benchmarks at a fraction of the cost.

Introduction
On December 16, 2025, Xiaomi stunned the global AI community with the official release of MiMo V2 Flash, marking a significant milestone in their foundation AI strategy. Unlike previous iterations that focused on general chat capabilities, this new model is explicitly engineered for high-speed reasoning, mathematical problem-solving, and complex code generation. As part of the expanding MiMo-V2 family, which includes the flagship 1-trillion parameter Pro model, the Flash variant prioritizes efficiency without sacrificing intelligence.
This release signals Xiaomi's serious intent to compete directly with Western tech giants like OpenAI and Anthropic. By opening the weights of MiMo V2 Flash, the company aims to empower developers to build agentic systems within the Human X Car X Home ecosystem. The model's architecture is designed to handle the rigorous demands of enterprise software engineering and scientific computing, positioning it as a critical tool for the next generation of autonomous agents.
What makes this announcement particularly noteworthy is the balance between performance and accessibility. While the Pro version pushes the boundaries of raw parameter count, the Flash model delivers comparable reasoning capabilities through a more efficient sparse architecture. This approach ensures that developers can deploy high-level reasoning tasks on standard hardware clusters, democratizing access to enterprise-grade AI capabilities.
- Release Date: December 16, 2025
- Provider: Xiaomi AI Lab
- License: Open Source
- Primary Focus: Reasoning and Code Generation
Key Features & Architecture
At the core of MiMo V2 Flash lies a sophisticated Mixture of Experts (MoE) architecture comprising 309 billion parameters. This design allows the model to activate only the necessary experts for specific tasks, significantly reducing inference latency and computational costs compared to dense models. The architecture is optimized for sparse activation, ensuring that the model remains lightweight during deployment while maintaining the capacity for complex logical deductions.
The model features a massive context window of 128,000 tokens, enabling it to process extensive documentation, multi-file codebases, and long-form technical reports in a single pass. Additionally, MiMo V2 Flash supports multimodal inputs, allowing it to interpret diagrams and mathematical formulas alongside text. This capability is crucial for engineering applications where visual data often accompanies textual specifications.
Xiaomi has emphasized the model's efficiency in their recent partner conference, highlighting its ability to run on consumer-grade GPUs with minimal overhead. The open-source nature of the release means that the community can inspect the weights and fine-tune the model for specific verticals, fostering a collaborative ecosystem around Xiaomi's hardware and software stack.
- Architecture: 309B MoE (Sparse)
- Context Window: 128k Tokens
- Modality: Text + Math + Code
- Inference Speed: High (Sparse Activation)
Performance & Benchmarks
In terms of raw performance, MiMo V2 Flash delivers results that rival the industry's top-tier closed models. On the MMLU benchmark, the model achieved a score of 88.5%, demonstrating superior general knowledge retention. However, the true differentiator lies in its specialized reasoning capabilities, where it excels in domains requiring step-by-step logical deduction and algorithmic thinking.
For developers, the HumanEval benchmark score of 92.1% is a standout metric, indicating exceptional proficiency in generating functional Python and JavaScript code. Furthermore, on the SWE-bench challenge, MiMo V2 Flash successfully resolved 45% of the complex software issues, outperforming several previous open-source models by a significant margin. These numbers suggest that the model is not just a chatbot but a viable tool for automated software engineering workflows.
Comparatively, the model approaches the performance of GPT-5.2 and Opus 4.6 on reasoning tasks while maintaining a fraction of the computational cost. This efficiency is achieved through the MoE structure, which allows the model to bypass unnecessary processing steps during inference, making it ideal for real-time agent interactions.
- MMLU Score: 88.5%
- HumanEval Score: 92.1%
- SWE-bench Pass Rate: 45%
- Reasoning Latency: Low
API Pricing
Xiaomi has adopted a competitive pricing strategy to encourage widespread adoption of the MiMo V2 Flash API. The input cost is set at $0.20 per million tokens, while the output cost is $0.60 per million tokens. This pricing structure is significantly lower than the industry standard for comparable reasoning models, making it attractive for startups and large-scale enterprise deployments.
Developers can also access a generous free tier that includes 100,000 tokens per month for testing and prototyping. This tier allows teams to validate their use cases before committing to paid API usage. The pricing model is designed to scale with usage, ensuring that high-volume applications remain cost-effective without compromising on the quality of the responses generated by the model.
For those looking to self-host, the open-source weights are available on major platforms like Hugging Face and ModelScope. While self-hosting incurs hardware costs, the efficiency of the 309B MoE architecture means that the operational expenditure can be lower than paying for API calls for high-volume internal tools.
- Input Price: $0.20 / M tokens
- Output Price: $0.60 / M tokens
- Free Tier: 100k tokens/month
- Self-Hosting: Open Weights Available
Comparison Table
When evaluating MiMo V2 Flash against its closest competitors, the balance of cost and capability becomes evident. The table below highlights the key specifications that differentiate this model from established players in the market. Developers should consider these metrics when selecting the right tool for their specific reasoning or coding needs.
- Model
- Context Window
- Max Output
- Input Cost
- Output Cost
- Key Strength
Use Cases
The capabilities of MiMo V2 Flash make it ideal for a wide range of specialized applications. In the software development lifecycle, it can serve as a primary coding assistant that understands entire codebases, generates unit tests, and refactors legacy code with high accuracy. Its strong mathematical reasoning also makes it suitable for scientific research, where it can assist in deriving formulas or debugging complex simulations.
For autonomous agents, the model's ability to execute multi-step tasks efficiently is a game-changer. Developers can build agents that interact with APIs, manage databases, and execute scripts based on natural language commands. The open-source nature also allows for fine-tuning on domain-specific data, such as legal documents or medical records, expanding its utility beyond general-purpose tasks.
Within the Xiaomi ecosystem, the model is integrated into smart home devices to provide more intelligent command processing. This allows users to execute complex home automation routines through simple voice commands, leveraging the model's reasoning to interpret ambiguous requests accurately.
- Software Engineering & Refactoring
- Scientific Research & Math
- Autonomous Agent Execution
- Smart Home Automation
Getting Started
Accessing MiMo V2 Flash is straightforward for developers looking to integrate it into their workflows. The model is available via the official Xiaomi AI API portal, where users can generate API keys and access the SDKs for Python, JavaScript, and Go. For open-source enthusiasts, the weights are hosted on Hugging Face under the official Xiaomi repository, allowing for immediate local deployment.
To begin using the API, developers can register for an account and obtain a free tier key. The SDKs provide simplified methods for handling streaming responses and managing context windows. Documentation is comprehensive, including examples for common reasoning tasks and code generation workflows that showcase the model's capabilities.
For enterprise users, Xiaomi offers dedicated support channels to assist with high-volume deployment and custom fine-tuning requirements. The combination of open-source flexibility and commercial API support ensures that MiMo V2 Flash can be adapted to both individual developer needs and large-scale corporate infrastructure.
- API: api.xiaomi.ai
- Weights: Hugging Face
- SDKs: Python, JS, Go
- Support: Enterprise Portal
Comparison
Model: MiMo V2 Flash | Context: 128k | Max Output: 8k | Input $/M: $0.20 | Output $/M: $0.60 | Strength: Reasoning & Code
Model: GPT-4o | Context: 128k | Max Output: 4k | Input $/M: $5.00 | Output $/M: $15.00 | Strength: General Chat
Model: DeepSeek-V3 | Context: 64k | Max Output: 8k | Input $/M: $0.14 | Output $/M: $0.28 | Strength: Cost Efficiency
Model: Claude 3.5 | Context: 200k | Max Output: 4k | Input $/M: $3.00 | Output $/M: $15.00 | Strength: Long Context
API Pricing β Input: $0.20 / Output: $0.60 / Context: 128k