Skip to content
Back to Blog
Model Releases

Mistral Medium 3.5: The 128B Open-Source Flagship for 2026

Mistral AI releases Mistral Medium 3.5, a 128B dense open-source model merging reasoning, coding, and instruction following into a single efficient architecture.

April 29, 2026
Model ReleaseMistral Medium 3.5
Mistral Medium 3.5 128B benchmark comparison chart from Hugging Face

Introduction: A Historical Milestone in Open Source AI

Mistral Medium 3.5 represents a pivotal shift in the landscape of open-source large language models. Released on April 29, 2026, this model sets a new standard for performance and accessibility within the developer community. By merging instruction-following, reasoning, and coding capabilities into a unified architecture, it marks a historical milestone for the open weights movement.

This release signals that high-performance inference does not require proprietary restrictions or massive compute clusters. Mistral has successfully condensed frontier-level intelligence into a dense 128B parameter model that remains accessible to self-hosted environments. This democratization allows smaller teams to access enterprise-grade AI without relying on closed ecosystems.

The significance extends beyond raw parameters; it integrates advanced agentic workflows directly into the model's core. This ensures that developers can leverage complex reasoning tasks locally or via API with consistent reliability. The model is designed to bridge the gap between lightweight efficiency and heavyweight capability.

  • Released: 2026-04-29
  • Category: Open Source LLM
  • Provider: Mistral AI

Key Features & Architecture Details

The architecture is defined by its 128B dense parameters, optimized for high-quality reasoning without the complexity of MoE routing. This dense structure ensures consistent performance across diverse tasks, from natural language understanding to complex code generation. The model supports open weights under a modified MIT license, encouraging broad adoption and community-driven improvements.

Hardware efficiency is a core design pillar. The model is engineered to run self-hosted on as few as four GPUs, significantly lowering the barrier to entry for local deployment. This efficiency allows developers to maintain data privacy while utilizing cutting-edge inference speeds. The context window supports up to 128K tokens, enabling long-form document analysis and extended conversation history.

Multimodal capabilities are integrated natively, allowing the model to process and generate diverse data types seamlessly. This flexibility makes it suitable for RAG pipelines and multimodal applications. The combination of dense architecture and optimized hardware requirements ensures that the model remains competitive with larger, more expensive proprietary alternatives.

  • Parameters: 128B Dense
  • License: Modified MIT
  • Inference: 4 GPU Minimum
  • Context Window: 128K Tokens

Performance & Benchmark Analysis

Benchmark results indicate that Mistral Medium 3.5 performs at or above 90% of Claude Sonnet 3.7 across the board. This includes significant gains in MMLU, HumanEval, and SWE-bench evaluations. The model demonstrates robust reasoning capabilities that rival top-tier closed-source competitors while maintaining open weights transparency.

In coding tasks, the model excels at JSON generation and complex algorithm implementation. It shows strong performance in SWE-bench, validating its utility for software engineering workflows. The balance between multimodal input and tool use ensures that it is not just a text generator but a functional agent capable of interacting with external systems.

While benchmark scores are modest across some specialized reasoning domains compared to the absolute frontier, it remains a strong mid-tier option for content and general tasks. This makes it a versatile choice for businesses that require high reliability without the cost of top-tier proprietary models. The consistency across different evaluation metrics confirms its stability.

  • MMLU Score: 90% of Sonnet 3.7
  • HumanEval: High Efficiency
  • SWE-bench: Strong Engineering Support
  • Multimodal: Native Support

API Pricing & Cost Efficiency

Mistral AI has committed to transparent pricing that makes commercial adoption viable. The API pricing structure is set at $1.50 per million tokens for input and $7.50 per million tokens for output. This pricing model is significantly lower than many proprietary alternatives, offering a cost-effective solution for high-volume applications.

The cost efficiency is further enhanced by the ability to self-host the model for internal use. For cloud deployments, the predictable pricing allows for accurate budgeting and resource allocation. There is no free tier available for the API, but the open weights license allows for free self-hosted inference without token costs.

Unique pricing features include optimized tokenization that reduces unnecessary overhead during processing. This ensures that developers pay only for the computational value they receive. The pricing remains competitive even as the model scales, making it suitable for both startups and large enterprises.

  • Input Price: $1.50 / 1M tokens
  • Output Price: $7.50 / 1M tokens
  • Free Tier: N/A
  • Self-Hosted: Free License

Use Cases: Agents, Coding, and Workflows

The model powers the new Mistral Vibe remote agents for asynchronous cloud coding sessions. This feature allows developers to collaborate in real-time environments with AI assistance that understands context deeply. It is particularly suited for complex coding tasks where reasoning is critical for debugging and optimization.

In the Le Chat interface, the model drives Work mode for multi-step agentic task execution. This includes parallel tool calling capabilities that speed up workflow automation. Sessions can be spawned from the CLI or Le Chat, and local CLI sessions can be teleported to the cloud for remote access.

Best suited applications include enterprise RAG systems, autonomous coding assistants, and multi-agent orchestration. The model handles JSON and structured data generation with high accuracy, making it ideal for data pipelines. It is also effective for general chat and content creation where consistency is paramount.

  • Mistral Vibe Remote Agents
  • Le Chat Work Mode
  • Parallel Tool Calling
  • CLI to Cloud Teleportation

Getting Started: Access and Integration

Developers can access the model immediately via the official API endpoint. SDKs are available for Python, JavaScript, and Go, simplifying integration into existing stacks. The documentation provides clear examples for both synchronous and asynchronous request handling to optimize throughput.

For local deployment, users can clone the repository and follow the setup guide for the four-GPU configuration. This ensures that privacy-conscious teams can run the model on-premise without data leakage risks. The open weights license facilitates fine-tuning for specific domain tasks like legal or medical text processing.

Community support is robust, with active forums and GitHub issues for troubleshooting. Mistral provides regular updates on performance patches and new features. Staying connected with the official blog ensures that users are informed of the latest advancements in the model family.

  • API Endpoint: Available
  • SDKs: Python, JS, Go
  • Local Setup: 4 GPU Guide
  • License: Modified MIT

API Pricing β€” Input: $1.50 / Output: $7.50 / Context: 128K


Sources

Mistral AI Official Blog: Vibe Remote Agents & Mistral Medium 3.5

Mistral AI News: Mistral Medium 3

Mistral Medium 3.5 128B - BenchLM

Mistral Coding Stack Announcement