Laguna-XS.2: Poolside's Open-Source Coding Model Release
Poolside unveils Laguna-XS.2, a 33B MoE coding model for local deployment and agentic workflows.

Introduction
Poolside has officially announced the release of Laguna-XS.2, a groundbreaking coding model designed to redefine local software engineering capabilities. Released on April 28, 2026, this model marks a significant milestone in the open-weight space, bridging the gap between enterprise-grade performance and consumer hardware accessibility. Unlike previous iterations that required massive cloud clusters, Laguna-XS.2 is engineered for practical deployment, offering developers a powerful tool that can run directly on their local machines without compromising on reasoning capabilities. This release signifies a shift towards democratizing high-performance AI for individual engineers and small teams who need robust coding assistance without recurring infrastructure costs. The model represents a new standard for agentic coding, prioritizing efficiency and accessibility over raw parameter count.
This announcement is particularly timely as the industry moves towards more specialized models for specific engineering tasks. The combination of open-source licensing and local runnability sets a new precedent for how coding assistants are delivered to the end-user.
Key Features & Architecture
The architecture of Laguna-XS.2 is built on a 33-billion parameter Mixture-of-Experts (MoE) structure, which ensures high efficiency. Only 3 billion parameters are activated per token, drastically reducing memory footprint while maintaining model density. The model supports native reasoning, featuring interleaved thinking between tool calls to enhance complex task execution. It utilizes Sliding Window Attention with per-head gating in 30 of 40 layers, optimizing attention mechanisms for long-context tasks. Additionally, the KV cache is quantized to FP8, further reducing memory requirements per token. This efficiency makes the model compact enough to run locally on a Mac with 36 GB RAM, a feat previously unattainable with models of this scale. The open-source license under Apache 2.0 encourages widespread adoption and modification by the community.
The technical specifications focus heavily on reducing inference costs and memory usage without sacrificing intelligence. This allows developers to experiment with advanced coding tasks on consumer-grade hardware.
Performance & Benchmarks
Performance benchmarks highlight the model's strength in coding and reasoning tasks. It achieves 68.2% on SWE-bench Verified and 62.4% on SWE-bench Multilingual, demonstrating superior code generation and understanding. On SWE-bench Pro, it scores 44.5%, and on Terminal-Bench 2.0, it scores 30.1%. Trained on 30 trillion tokens using the Muon optimizer, the model ensures broad knowledge coverage. The 128K context window allows for handling entire codebases, while the 8K output token limit provides sufficient space for detailed responses. These metrics place it competitively against larger closed-source models in specific engineering domains.
The high scores on SWE-bench Verified indicate a strong capability in solving verified software engineering problems, which is crucial for automated testing and debugging scenarios.
API Pricing
For this limited time, the model is free to use via the poolside API and OpenRouter. There are no input or output costs during this promotional period, making it an ideal choice for testing and integration. The context window supports 128K tokens, allowing for extensive session history. This pricing structure removes financial barriers for developers eager to adopt the latest open-source technology. The free tier is designed to foster adoption and allow developers to benchmark the model against their existing workflows without financial risk. This is a unique opportunity to evaluate the model before the free tier expires.
Developers can access the model without credit card details during this period, facilitating rapid prototyping and integration into production pipelines.
Use Cases
Laguna-XS.2 is best suited for agentic coding workflows and long-horizon software engineering tasks. It excels in scenarios requiring multi-step reasoning, such as debugging complex systems or refactoring legacy code. Its support for vLLM, Transformers, TRT-LLM, and Ollama ensures compatibility with existing deployment stacks. Developers can leverage this model for RAG applications, chat interfaces, and automated testing pipelines where code generation is paramount. The model's ability to interleave thinking with tool calls makes it particularly effective for autonomous agents that need to execute code sequences.
It is also ideal for environments where latency and cost are critical constraints, such as embedded systems or edge computing scenarios where cloud inference is not feasible.
Getting Started
Access the model through the official poolside API or via the Hugging Face Hub. You can also deploy it locally using Ollama or vLLM. Documentation is available on the official GitHub repository, providing examples for integration into your CI/CD pipelines. The open-source nature of the model under Apache 2.0 license encourages community contributions and further innovation. Users can find SDKs and sample notebooks to facilitate rapid prototyping of their own applications.
Getting started requires minimal setup, allowing engineers to begin coding with the new model within minutes of installation.
API Pricing β Input: $0/M tokens (free for limited time) / Output: $0/M tokens (free for limited time) / Context: 128K