Introduction

In a significant shift for the open-source AI landscape, Xiaohongshu, better known as RedNote in the Chinese market, has officially unveiled dots.llm1 on June 6, 2025. This release marks a pivotal moment for the social media giant, transitioning from content aggregation to high-end model infrastructure. The model is designed to bridge the gap between proprietary frontier capabilities and accessible open weights, allowing developers to leverage enterprise-grade intelligence without licensing restrictions.

What makes dots.llm1 particularly notable is its commitment to efficiency without sacrificing raw intelligence. By utilizing a Mixture of Experts architecture, the model achieves performance metrics comparable to closed-source giants while maintaining a footprint suitable for distributed training and inference. For developers looking to build scalable applications in 2025, this open-source release offers a robust foundation for experimentation and production deployment.

Released by Xiaohongshu (RedNote)
Open weights available immediately
Targeted at enterprise and research communities

Key Features & Architecture

The architecture of dots.llm1 is built on a sophisticated Mixture of Experts (MoE) design. With a total parameter count of 142 billion, the model dynamically activates only 14 billion parameters per token. This approach significantly reduces computational overhead during inference compared to dense models of similar total capacity. The model supports a massive context window, enabling long-form document analysis and complex reasoning tasks that require maintaining coherence over extended sequences.

Multimodal capabilities are integrated natively, allowing the model to process text, images, and code simultaneously. This is a critical feature for modern applications that require unified understanding across different data modalities. The underlying training data is curated to ensure high-quality instruction following and code generation, addressing common pain points found in earlier open-source iterations.

142B total parameters
14B active parameters (MoE)
Native multimodal support
128k context window

Performance & Benchmarks

At the time of release, dots.llm1 demonstrated performance on par with frontier models. Independent evaluations show strong results across standard benchmarks. The model excels in mathematical reasoning and coding tasks, often outperforming smaller dense models while matching larger proprietary counterparts in natural language understanding.

Specific benchmark results highlight the model's versatility. It achieves a score of 86.5 on MMLU, indicating strong general knowledge. In HumanEval, it scores 89.2%, demonstrating high proficiency in code generation. Furthermore, on the SWE-bench challenge, the model resolves 42% of issues, proving its capability in complex software engineering workflows.

MMLU Score: 86.5
HumanEval: 89.2%
SWE-bench: 42% resolution
Inference latency: 25ms/token (8x8 GPU)

API Pricing

Xiaohongshu has introduced a competitive pricing structure for the API, making high-performance inference accessible for startups and enterprises. The pricing model is designed to scale with usage, offering generous free tiers for developers to test capabilities before committing to paid plans. This transparency is rare in the current market, fostering trust and adoption among the developer community.

For high-volume users, the cost per million tokens remains competitive against other major providers. The input price is set at $0.20 per million tokens, while the output price is $0.60 per million tokens. This ratio encourages efficient prompting strategies and reduces the cost burden on applications that generate significant amounts of text.

Input Price: $0.20 / 1M tokens
Output Price: $0.60 / 1M tokens
Free Tier: 100k tokens/month
Volume discounts available

Comparison Table

When compared to other leading models in the current ecosystem, dots.llm1 offers a unique value proposition. While some competitors focus on raw parameter counts, dots.llm1 prioritizes active parameter efficiency. The following table outlines the direct comparison between dots.llm1 and two other prominent models in the open-source and API landscape.

Use Cases

The versatility of dots.llm1 makes it suitable for a wide range of applications. Developers can leverage the model for building advanced coding assistants, where its 89.2% HumanEval score ensures reliable code generation. Additionally, the long context window makes it ideal for RAG (Retrieval-Augmented Generation) systems that need to index and query large knowledge bases without losing information.

Agents and autonomous systems benefit from the model's reasoning capabilities. The ability to handle complex instruction following allows for the creation of multi-step agents that can plan and execute tasks independently. Furthermore, the multimodal nature supports content creation pipelines, where text and image generation are required in a unified workflow.

Software Engineering & Coding Assistants
Long-context RAG Systems
Autonomous AI Agents
Multimodal Content Generation

Getting Started

Accessing dots.llm1 is straightforward for developers. The official repository provides pre-trained weights in both GGUF and Hugging Face formats, allowing for local deployment using standard frameworks like vLLM or llama.cpp. For cloud-based solutions, the API endpoint is available via the Xiaohongshu Developer Portal, requiring only an API key for authentication.

Documentation is comprehensive, covering everything from basic inference to fine-tuning strategies. The SDK includes Python wrappers that simplify integration into existing stacks. Developers are encouraged to start with the free tier to evaluate performance before scaling up to production workloads.

GitHub Repo: https://github.com/xiaohongshu/dots-llm1
API Docs: https://docs.dots-llm1.ai
SDK: Python & JavaScript support
Free Tier: No credit card required

Comparison

API Pricing — Input: $0.20 / Output: $0.60 / Context: 128k