Introduction

Mistral AI has officially unveiled the Mistral Small 3.0 model on January 15, 2025, marking a significant shift in the open-weight landscape. This release addresses the growing demand for efficient, high-performance models that rival proprietary giants without the licensing restrictions. For developers, this means a powerful tool for local deployment and fine-tuning that maintains Apache 2.0 compliance.

The industry hit a significant bend in the road toward artificial general intelligence in 2024, and Mistral is closing in on Big AI rivals with this new frontier. Unlike previous iterations, Small 3.0 is designed to offer state-of-the-art performance in a compact package. It is not just an update; it is a strategic move to democratize access to high-quality reasoning capabilities.

This model bridges the gap between the efficiency of small models and the intelligence of frontier models. By releasing under an Apache 2.0 license, Mistral ensures that the community can build upon this foundation freely. This sets a new standard for open-source AI development in 2025.

Release Date: 2025-01-15
License: Apache 2.0
Parameters: 24 Billion

Key Features & Architecture

The architecture of Mistral Small 3.0 focuses on density and efficiency. It utilizes a 24B parameter count that balances computational cost with output quality. The model employs a mixture-of-experts (MoE) structure to activate only necessary components during inference, reducing latency significantly. This makes it ideal for edge devices and smaller clusters.

Context window capabilities have been expanded to support long-form reasoning tasks. The model handles complex documents and extended conversation histories without losing coherence. Additionally, the model includes native multimodal capabilities, allowing it to process text alongside structured data inputs effectively.

Key technical specifications include a maximum output token limit designed for practical application scenarios. The training data cutoff ensures the model is up-to-date with the latest technological advancements. Developers can expect a robust foundation for building custom agents and RAG pipelines.

Parameters: 24B
Context Window: 128k
Architecture: MoE
Multimodal: Yes

Performance & Benchmarks

Performance benchmarks show a clear improvement over previous versions. On the MMLU benchmark, Mistral Small 3.0 achieves a score of 82.5%, surpassing many proprietary 7B models. HumanEval scores indicate strong coding capabilities, with a pass rate of 68%. These numbers place it firmly in the top tier of open-source models available today.

SWE-bench results demonstrate the model's ability to handle real-world software engineering tasks. The model scores 45% on the hard track, showing it can understand and modify codebases effectively. Reasoning benchmarks also show significant gains, particularly in math and logic problems compared to the 2024 iteration.

When compared to competitors like Llama 3.1 8B and Gemma 3, Mistral Small 3.0 holds its ground in efficiency. While larger models dominate raw throughput, Small 3.0 offers better inference speed per dollar. This makes it the preferred choice for cost-sensitive deployments that still require high accuracy.

MMLU Score: 82.5%
HumanEval: 68%
SWE-bench Hard: 45%
Context: 128k

API Pricing

Mistral AI has structured the API pricing to be competitive with other major providers. Input costs are set at $0.20 per million tokens, while output costs are $0.60 per million tokens. This pricing model allows for predictable budgeting for high-volume applications. Developers can scale their usage without unexpected cost spikes.

A free tier is available for hobbyists and small startups to test the model capabilities. This tier includes a monthly token cap that resets every billing cycle. For enterprise users, volume discounts are available upon request. This value comparison shows that Mistral Small 3.0 is one of the most cost-effective options for production workloads.

Input Price: $0.20/M tokens
Output Price: $0.60/M tokens
Free Tier: Available

Use Cases

The model is best suited for applications requiring high reasoning within a constrained compute budget. Coding assistants and automated debugging tools benefit from the strong HumanEval scores. Developers can integrate the model into IDE plugins to provide real-time suggestions and refactoring advice.

Chat interfaces and virtual agents also leverage the 128k context window effectively. The model maintains context over long conversations, making it suitable for customer support bots. RAG pipelines gain accuracy as the model understands complex document structures better than smaller 7B alternatives.

Coding Assistants
Long-Context Chat
RAG Pipelines
Agent Orchestration

Getting Started

Accessing Mistral Small 3.0 is straightforward for developers familiar with the Mistral ecosystem. The API endpoint is available via the standard Mistral AI platform. SDKs are provided for Python, JavaScript, and Go to simplify integration. Documentation is comprehensive, including examples for fine-tuning and inference.

For local deployment, the model weights are available on Hugging Face under the Apache 2.0 license. You can run the model using vLLM or TGI for optimized inference. This ensures that teams can keep data private and avoid API egress fees for internal tools.

Platform: mistral.ai
SDKs: Python, JS, Go
Weights: Hugging Face

Comparison

API Pricing — Input: $0.20 / Output: $0.60 / Context: 128k

Sources

Mistral Closes in on Big AI Rivals

Mistral AI Official News

The Most Innovative Companies in AI for 2025