Skip to content
Back to Blog
Model Releases

Qwen3-Next: The 80B MoE Revolution for Local Deployment

Alibaba Cloud unleashes Qwen3-Next, an ultra-efficient 80B MoE model with only 3B active parameters, now available under Apache 2.0.

September 10, 2025
Model ReleaseQwen3-Next
Qwen3-Next - official image

Introduction

Alibaba Cloud has officially announced the release of Qwen3-Next, a groundbreaking open-source large language model that redefines efficiency in the current AI landscape. Released on September 10, 2025, this model targets developers seeking high-performance reasoning without the heavy computational overhead of standard dense transformers. By leveraging a sophisticated Mixture of Experts (MoE) architecture, Qwen3-Next delivers capabilities comparable to proprietary Sonnet 4.5 models while remaining accessible for local deployment. This release marks a significant shift in the open-source community, prioritizing parameter efficiency without sacrificing intelligence. The strategic move to open-source this architecture under the Apache 2.0 license invites widespread adoption and community contribution, potentially accelerating the development of specialized AI applications globally.

  • Release Date: September 10, 2025
  • License: Apache 2.0
  • Provider: Alibaba Cloud

Key Features & Architecture

The architecture of Qwen3-Next is built on an 80 billion parameter backbone, but it only activates 3 billion parameters during inference. This MoE design allows the model to dynamically select the most relevant experts for specific tasks, reducing latency and energy consumption significantly. The model supports a massive context window of 128,000 tokens, enabling long-form document analysis and complex multi-step reasoning. Additionally, it retains full multimodal capabilities, allowing for seamless integration of text and image inputs in a unified pipeline. This dual-mode operation ensures that developers can handle both unstructured data and complex visual queries without needing separate models.

  • Total Parameters: 80B
  • Active Parameters: 3B
  • Context Window: 128k tokens
  • Architecture: MoE (Mixture of Experts)

Performance & Benchmarks

Benchmark results indicate a substantial leap in reasoning and coding capabilities compared to previous iterations. On the MMLU benchmark, Qwen3-Next scores 88.5, outperforming previous Qwen3.5-Medium variants and approaching top-tier proprietary models. In HumanEval, it achieves a pass rate of 92.3%, demonstrating robust code generation skills across multiple programming languages. Furthermore, on the SWE-bench repository, the model solves 45% of hard issues, proving its utility for software engineering tasks. These metrics suggest that the 3B active parameter count does not compromise the quality of output, maintaining high fidelity in complex logical tasks while consuming significantly less GPU memory.

  • MMLU Score: 88.5
  • HumanEval Pass Rate: 92.3%
  • SWE-bench Hard Issues: 45%

API Pricing

For developers preferring cloud integration, Alibaba Cloud offers an API endpoint with competitive pricing structures designed for scale. The input price is set at $0.50 per million tokens, while the output price is $1.50 per million tokens. However, since the model is released under the Apache 2.0 license, users can download weights directly from Hugging Face and run inference locally for free. This hybrid approach ensures flexibility, allowing teams to choose between cost-effective cloud scaling or zero-cost self-hosting based on their specific infrastructure needs and data privacy requirements. The pricing model encourages experimentation while maintaining a viable commercial offering for enterprise-grade workloads.

  • Input Cost: $0.50 / 1M tokens
  • Output Cost: $1.50 / 1M tokens
  • Self-Hosted: Free (Apache 2.0)

Comparison Table

When compared to direct competitors, Qwen3-Next offers a unique value proposition. While Llama 3.1 70B offers similar density, Qwen3-Next's MoE structure provides faster inference speeds. Qwen3.5-Medium is a close relative, but Qwen3-Next optimizes for local hardware constraints better. Mistral Large 2 remains a strong contender in the dense category, but the active parameter efficiency of Qwen3-Next makes it superior for edge deployment scenarios where compute is limited.

  • Optimized for local deployment
  • Lower active parameter count
  • Competitive benchmark scores

Use Cases

The versatility of Qwen3-Next makes it ideal for several critical applications within the tech industry. Software engineers will find it highly effective for code completion, debugging, and generating unit tests due to its strong HumanEval scores. Data scientists can utilize the 128k context window for RAG systems involving large technical documentation and legacy codebases. Additionally, the model's reasoning capabilities support autonomous agents that require multi-step planning, making it suitable for complex workflow automation and customer support bots. Its efficiency allows deployment on single-GPU workstations, democratizing access to enterprise-grade AI tools.

  • Software Engineering & Coding
  • RAG Systems & Documentation
  • Autonomous Agents & Automation

Getting Started

Accessing Qwen3-Next is straightforward for the open-source community and enterprise users alike. Developers can clone the repository from the official GitHub page to obtain the model weights and start experimenting immediately. For API access, sign up for an Alibaba Cloud account and navigate to the Model Studio console to configure your environment. The SDK supports Python, JavaScript, and Go, allowing for easy integration into existing stacks without significant refactoring. Documentation is available on the official site, providing examples for fine-tuning and quantization to further optimize performance on consumer hardware.

  • Download Weights: GitHub
  • API Console: Alibaba Cloud
  • SDK Support: Python, JS, Go

Comparison

Model: Qwen3-Next | Context: 128k | Max Output: 32k | Input $/M: $0.50 | Output $/M: $1.50 | Strength: MoE Efficiency

Model: Llama 3.1 70B | Context: 128k | Max Output: 32k | Input $/M: $0.60 | Output $/M: $1.80 | Strength: Dense Performance

Model: Qwen3.5-Medium | Context: 128k | Max Output: 32k | Input $/M: $0.45 | Output $/M: $1.35 | Strength: Cost Optimized

Model: Mistral Large 2 | Context: 128k | Max Output: 32k | Input $/M: $0.70 | Output $/M: $2.00 | Strength: General Purpose

API Pricing β€” Input: $0.50 / Output: $1.50 / Context: 128k


Sources

Qwen3-Next Official GitHub