Cohere's new North Mini Code model brings elite reasoning and a massive 256K context window to the open-source coding ecosystem.

The landscape of open-source coding models just shifted significantly. On June 9, 2026, Cohere officially released North Mini Code, a specialized model designed to bridge the gap between lightweight local assistants and heavy-duty enterprise reasoning engines. For developers who have long felt caught between the convenience of managed APIs and the privacy of self-hosted models, this release offers a compelling middle ground.
This isn't just another fine-tuned LLM; it is a purpose-built tool for the modern software engineering lifecycle. Whether you are performing complex refactors on a massive monorepo or building autonomous agentic pipelines, North Mini Code provides the reasoning depth required to handle sophisticated logic without the massive computational overhead of larger generalist models.
North Mini Code is built on a highly efficient architecture that prioritizes both intelligence and throughput. While it is categorized as an open-weights model, its underlying design allows it to punch far above its weight class in terms of reasoning capabilities. It is a text-in, text-out model, optimized specifically for the nuances of programming languages and technical documentation.
One of the most standout features is the massive context window. Supporting up to 256K tokens, the model can ingest entire codebases, long-form technical specifications, or extensive documentation sets in a single prompt. Furthermore, it supports an impressive output capacity of up to 64K tokens, making it capable of generating large-scale boilerplate, entire modules, or comprehensive test suites in one go.
To understand the caliber of North Mini Code, one must look at the benchmarks. The model achieves a remarkable 75.7% on the GPQA Diamond, a benchmark known for testing advanced generalist reasoning. This places North Mini Code in the same tier as many of the world's most advanced proprietary reasoning models, proving that 'mini' does not mean 'limited.'
In specialized coding and technical domains, the numbers remain consistently strong. The model demonstrates high proficiency in scientific coding and terminal-based tasks, making it a versatile tool for DevOps and research engineers alike. Its ability to follow complex instructions is further validated by its high scores across several industry-standard evaluations.
When North Mini Code was first announced, it was briefly listed with a $0/M token pricing model. However, Cohere has since transitioned the model to a paid pricing structure to support ongoing development and infrastructure. Developers should note that while the model is open-source and can be self-hosted, using the managed Cohere API offers the convenience of high availability and managed scaling.
Because pricing models for high-demand models can fluctuate based on enterprise agreements and regional availability, we recommend visiting the official Cohere pricing page to get the most current rates for your specific use case. For those looking to minimize costs, self-hosting via local hardware remains a viable and powerful option.
The versatility of North Mini Code makes it an ideal candidate for several high-impact engineering workflows. Because of its Apache 2.0 license, it is particularly well-suited for enterprise environments where data privacy and local execution are non-negotiable. It excels in scenarios requiring deep understanding of large-scale code structures.
Developers can leverage the model for automated pull request reviews, where it can analyze changes against existing logic to catch subtle bugs. It is also an exceptional tool for test generation and large-scale refactoring projects. When deployed via tools like Ollama or llama.cpp, it serves as a highly capable local code assistant that lives directly on your machine.
Getting started with North Mini Code is straightforward, whether you prefer the ease of an API or the control of local deployment. For those wanting to integrate it into existing cloud workflows, the Cohere SDK provides a seamless way to call the model via API endpoints. This is the fastest way to prototype agentic coding pipelines.
For the local-first developer, the open-weights availability means you can pull the model through popular frameworks like Ollama. This allows you to run the model on your own hardware—such as a single H100 or high-end consumer GPUs—ensuring that your proprietary code never leaves your controlled environment.
API Pricing — Context: 256K tokens