Introduction

OpenBMB has officially unveiled MiniCPM-o 4.5, a groundbreaking multimodal large language model designed for the edge. Released on February 8, 2026, this model represents a significant leap forward in on-device AI processing, challenging the dominance of massive cloud-based giants with a compact 9 billion parameter architecture. Unlike previous iterations that relied on brute-force scaling, MiniCPM-o 4.5 leverages architectural innovation to deliver state-of-the-art benchmark scores while maintaining the efficiency required for local deployment.

The significance of this release lies in its ability to handle full-duplex real-time processing of audio, images, and video simultaneously. This capability is crucial for the next generation of AI agents that require immediate interaction without latency. By prioritizing efficiency without sacrificing intelligence, OpenBMB has positioned MiniCPM-o 4.5 as a viable alternative for developers seeking privacy-focused, high-performance multimodal solutions.

This model is not merely an incremental update but a redefinition of what is possible with lightweight multimodal models. It integrates end-to-end capabilities that were previously reserved for much larger systems, making it a top contender for edge computing applications. The release date coincides with a surge in demand for local AI, making MiniCPM-o 4.5 a timely addition to the developer ecosystem.

Release Date: 2026-02-08
Parameters: 9 Billion
Type: Open Source Multimodal LLM
Base Architecture: Qwen3-8B

Key Features & Architecture

MiniCPM-o 4.5 is built on a robust foundation, utilizing the Qwen3-8B architecture as its core backbone. This choice ensures a strong baseline for language understanding while allowing the model to focus its resources on multimodal integration. The model employs an end-to-end fashion construction, integrating SigLip2 for vision, Whisper-medium for audio processing, and CosyVoice2 for speech synthesis. This modular yet unified approach allows for seamless interaction between different modalities.

A standout feature is its support for full-duplex real-time audio, image, and video processing. This means the model can listen, see, and speak simultaneously without interruption, a critical requirement for live streaming applications and real-time assistance tools. The 9B parameter count is optimized for on-device inference, ensuring that even smartphones with limited resources can run the model efficiently without significant thermal throttling.

The architecture also emphasizes privacy, as the entire inference pipeline can occur locally on the user's hardware. This eliminates the need for constant cloud connectivity, which is a major selling point for enterprise applications handling sensitive data. The integration of CosyVoice2 ensures that voice interactions are natural and low-latency, bridging the gap between text-based LLMs and human-like communication.

Architecture: Qwen3-8B based
Vision: SigLip2
Audio: Whisper-medium
Voice: CosyVoice2
Latency: Low for on-device inference

Performance & Benchmarks

In terms of raw performance, MiniCPM-o 4.5 achieves an average score of 78.2 on OpenCompass, a comprehensive evaluation of 8 popular benchmarks. This score places it at Gemini 2.5 Flash level performance, which is a remarkable feat for a model with only 9 billion parameters. The model has been rigorously tested against competitors like GPT-4o, Qwen3-VL-8B, and InternVL-3.5-8B, consistently outperforming or matching them in specific vision and reasoning tasks.

The model excels in image understanding and video comprehension, areas where many smaller models typically struggle. Its ability to process video streams in real-time allows for dynamic content analysis, making it suitable for surveillance, education, and interactive media applications. The benchmark results indicate that the model maintains high accuracy even when processing complex, multi-step reasoning tasks that require integrating visual and textual context.

HumanEval and SWE-bench scores further validate its coding and problem-solving capabilities, demonstrating that size does not equate to intelligence in this architecture. The model's efficiency means it can run inference at a fraction of the cost of larger models while delivering comparable results. This balance of speed and accuracy is what makes MiniCPM-o 4.5 a preferred choice for developers building resource-constrained applications.

OpenCompass Score: 78.2
Comparison: Gemini 2.5 Flash level
Strengths: Vision, Video, Reasoning
Efficiency: High throughput on edge devices

API Pricing & Access

As an open-source model, MiniCPM-o 4.5 does not carry an official API subscription fee from OpenBMB. Developers can access the model weights directly for free, allowing for self-hosted deployment without per-token costs. However, if utilizing a third-party inference platform that hosts the model, pricing will depend on the provider's specific infrastructure costs. This open-access model encourages community contributions and fine-tuning, fostering a robust ecosystem around the technology.

For enterprise users, the value proposition lies in the elimination of data egress fees and latency costs associated with cloud APIs. By running MiniCPM-o 4.5 locally, companies can save significantly on operational expenditures (OpEx) while maintaining strict data sovereignty. The model's lightweight nature means it can be deployed on existing hardware without the need for expensive GPU clusters, further reducing capital expenditures.

Model Type: Open Source
Official API Cost: N/A
Deployment: Self-hosted or Edge
Privacy: Local Inference

Comparison Table

When comparing MiniCPM-o 4.5 against current market leaders, the differences in efficiency and capability become clear. While larger models offer higher raw context windows, MiniCPM-o 4.5 matches performance in critical multimodal tasks while using significantly fewer resources. The table below highlights the key distinctions between MiniCPM-o 4.5 and its primary competitors in the multimodal space.

MiniCPM-o 4.5 offers the best balance of speed and cost.
Competitors often require more hardware for similar tasks.

Use Cases

The versatility of MiniCPM-o 4.5 makes it suitable for a wide range of applications. In coding, it can assist developers by analyzing screenshots of errors and suggesting fixes in real-time. For reasoning tasks, it can process diagrams and charts to explain complex data visually, enhancing its utility in education and business intelligence.

In the realm of chat and agents, the full-duplex capability allows for natural conversation flows where the agent can observe the user's environment while speaking. RAG (Retrieval-Augmented Generation) systems can benefit from the model's ability to index and retrieve visual documents alongside text, creating a more comprehensive knowledge base for specialized queries.

Coding Assistance
Visual Reasoning
Real-time Chat Agents
RAG Systems

Getting Started

Accessing MiniCPM-o 4.5 is straightforward for developers. The model weights are available on Hugging Face under the openbmb organization, where you can find the AWQ quantized versions for faster inference. Additionally, the Ollama community has integrated the model, allowing users to run it locally with a single command via their command line interface.

For those interested in the full capabilities, the OpenBMB demo page provides a web-based interface to test the model's multimodal features directly in the browser. Developers should clone the GitHub repository to access the source code and fine-tuning scripts, enabling customization for specific verticals. Documentation is available on the official website to guide integration into existing pipelines.

Hugging Face: openbmb/MiniCPM-o-4_5-awq
Ollama: openbmb/minicpm-o4.5
GitHub: OpenBMB/MiniCPM-o
Demo: openbmb.github.io/MiniCPM-o-Demo

Comparison

API Pricing — Input: 0.00 / Output: 0.00

MiniCPM-o 4.5: 9B Multimodal AI Model Release