Mistral AI Unveils Magistral Small: 24B Reasoning Powerhouse
Mistral AI releases Magistral Small, a 24B parameter reasoning model with extended thinking capabilities, open-sourced under Apache 2.0.

Introduction
Mistral AI has officially announced the release of Magistral Small on June 5, 2025, marking a pivotal moment for open-source AI infrastructure. This 24-billion parameter model represents a significant leap in reasoning capabilities compared to previous iterations in the Small series. Unlike standard chat models, Magistral Small is specifically engineered to handle complex logical tasks through extended thinking processes.
The release comes amidst a competitive landscape where hardware efficiency is paramount. By optimizing for cost-sensitive use cases without sacrificing intelligence, Mistral positions this model as a viable alternative to proprietary giants. Developers can now access high-performance reasoning tools without the heavy licensing costs associated with closed-source models.
- Release Date: 2025-06-05
- Model Type: Reasoning Model
- Open Source: Yes (Apache 2.0)
Key Features & Architecture
Magistral Small utilizes a hybrid MoE (Mixture of Experts) architecture designed to balance compute efficiency with high-quality output generation. The model features a native context window of 128K tokens, allowing it to process extensive documentation and codebases in a single pass. A standout feature is its 'extended thinking' mode, which allows the model to draft intermediate reasoning steps before finalizing an answer.
Licensing remains a critical factor for enterprise adoption. Mistral has chosen the Apache 2.0 license for Magistral Small, ensuring that the weights are fully open for commercial use, modification, and distribution. This decision democratizes access to advanced reasoning capabilities while maintaining the sovereignty of European AI development.
- Parameters: 24B
- Context Window: 128K
- License: Apache 2.0
- Capability: Extended Thinking
Performance & Benchmarks
In terms of raw performance, Magistral Small demonstrates exceptional strength in STEM domains. It achieves a score of 85.4 on the MMLU benchmark, outperforming several proprietary models in its class. The model's reasoning engine is particularly robust in mathematical problem-solving and logical deduction tasks, where it scores significantly higher than baseline small language models.
Code generation and evaluation metrics also show impressive results. On HumanEval, the model achieves a pass rate of 88%, indicating strong capability for software engineering tasks. Furthermore, in the SWE-bench evaluation, Magistral Small successfully resolves complex software issues, proving its utility for developer-centric workflows.
- MMLU Score: 85.4
- HumanEval Pass Rate: 88%
- SWE-bench: High Resolution Rate
API Pricing
For those choosing the hosted API route, Mistral has introduced competitive pricing structures that align with the hardware-efficient design goals. The input cost is set at $0.20 per million tokens, while the output cost is $0.60 per million tokens. This pricing model is significantly lower than many enterprise-grade competitors, making it ideal for high-volume inference applications.
A free tier is available for developers to test the model's capabilities without immediate financial commitment. This tier includes a daily token cap of 100,000 tokens, sufficient for prototyping and small-scale experimentation. The value proposition remains strong when compared to the licensing fees of closed-source alternatives.
- Input Cost: $0.20 / 1M tokens
- Output Cost: $0.60 / 1M tokens
- Free Tier: 100K tokens/day
Model Comparison
When placed against direct competitors, Magistral Small offers a unique blend of reasoning power and cost efficiency. While larger models like GPT-4o offer higher raw intelligence, they come with significantly higher computational costs. Magistral Small bridges the gap by providing specialized reasoning capabilities at a fraction of the price.
Open-source alternatives like Llama 3.1 70B offer similar parameter counts but often lack the specialized extended thinking architecture found in Magistral. The comparison table below highlights the key differentiators across context windows, pricing, and specific strengths for enterprise decision-makers.
- Best for Cost Efficiency
- Strong Reasoning Focus
- Open Source Weights
Use Cases
Magistral Small is best suited for applications requiring deep reasoning without the overhead of massive parameter counts. Software development workflows benefit greatly from its ability to generate and debug complex code snippets autonomously. It is also highly effective for scientific research, where parsing large datasets and performing logical deductions are critical tasks.
Agentic workflows and RAG (Retrieval-Augmented Generation) systems can leverage the 128K context window to maintain coherence over long document interactions. The extended thinking feature allows agents to plan steps before execution, reducing hallucination rates in complex task chains.
- Code Generation and Debugging
- Scientific Reasoning
- Long-Context RAG Systems
- Autonomous Agents
Getting Started
Accessing Magistral Small is straightforward for both API users and local deployers. Developers can access the model via the Mistral AI API using standard SDKs for Python, JavaScript, and Go. For local deployment, the weights are available on Hugging Face under the Apache 2.0 license, allowing for private on-premise inference.
To begin, visit the official documentation portal to obtain your API key. Alternatively, pull the repository from the Mistral GitHub organization to run the model locally using vLLM or llama.cpp. This flexibility ensures that both cloud-native and sovereign AI strategies can be supported effectively.
- API Endpoint: api.mistral.ai
- Weights: Hugging Face
- SDKs: Python, JS, Go
Comparison
Model: Magistral Small | Context: 128K | Max Output: 8K | Input $/M: $0.20 | Output $/M: $0.60 | Strength: Reasoning & Cost
Model: GPT-4o | Context: 128K | Max Output: 16K | Input $/M: $5.00 | Output $/M: $15.00 | Strength: General Purpose
Model: Llama 3.1 70B | Context: 128K | Max Output: 8K | Input $/M: $0.59 | Output $/M: $1.40 | Strength: Open Source Power
API Pricing β Input: $0.20 / Output: $0.60 / Context: 128K