MindLab Research has released Macaron-V1-Preview-749B, a massive 749B parameter Mixture-of-LoRA model that redefines agentic workflows through specialist adapters.

On June 7, 2026, the landscape of open-source artificial intelligence shifted fundamentally. MindLab Research officially released Macaron-V1-Preview-749B, a model that doesn't just compete with closed-source giants but introduces an entirely new architectural philosophy: Mixture-of-LoRA (MoL).
While the industry has long been obsessed with scaling dense parameters, Macaron takes a more surgical approach. By combining a massive 744B frozen base with specialized, high-performance LoRA adapters, MindLab has created a model that is both incredibly deep in knowledge and hyper-specialized in execution. This isn't just another LLM release; it is a milestone in the evolution of autonomous agentic systems.
At the heart of Macaron-V1-Preview-749B is a 744B parameter frozen base derived from GLM-5.1. Rather than attempting to force a single set of weights to master everything from creative writing to low-level kernel debugging, MindLab utilizes five distinct 1B-parameter LoRA adapters. This 'MoL' approach allows the model to maintain a massive general knowledge base while switching specialized 'brains' on demand.
The routing mechanism is a masterpiece of engineering. Unlike traditional MoE where routing is hidden in the attention layers, Macaron uses a Router Tool design. This exposes model selection as a standard tool call via an explicit `change_model` function. For developers, this means the model's 'mode' is fully debuggable, observable, and compatible with standard vLLM OpenAI server modes. This architecture is tightly coupled with the Harness Context Protocol (HCP), ensuring that memory, state, and tool-call tokenization remain consistent across all specialist transitions.
To validate this complex architecture, MindLab introduced the Macaron LivingBench. Traditional benchmarks often fail to capture the nuances of agentic behavior in dynamic environments. LivingBench utilizes coupled dynamic noise, dynamic environments, and dynamic user simulation to test how well the model handles the unpredictability of real-world tasks.
Furthermore, the model's Generative UI capabilities were tested via the A2UI-Bench. This evaluates not just whether the model can output code, but the correctness of the protocol, the task construction, and the actual user-experience lift. In interactive scenarios, Macaron achieves a staggering 3ms Time Per Token (TPOT) latency through TileRT collaboration, making it one of the most responsive high-parameter models ever released.
What truly sets Macaron apart is its ability to improve itself. Through an 'AutoResearch + Context Learning' loop, the model can refine its own prompts and scaffolds. It then distills these improved trajectories back into its parameters, creating a continuous self-evolution cycle.
This is powered by the MindForge agentic RL training framework. Unlike traditional RLHF, MindForge brings the production-style agent harness directly into the RL loop. By using R3 (Rollout Routing Replay) and IcePop-style rollout correction, MindLab ensures that the model's reasoning paths are not just statistically likely, but provably aligned with expert agentic behavior.
Macaron-V1-Preview-749B is designed for high-stakes, complex environments. Because of its specialist LoRAs, it excels in domains where general-purpose models often hallucinate or lose precision. The L2 (Coding) adapter makes it a powerhouse for software engineering, while the L3 adapter enables the next generation of 'Generative UI' applications where the interface morphs in real-time to meet user needs.
For enterprise developers, the L4 (OpenClaw-style) adapter is a game-changer for building autonomous agents that can navigate complex software ecosystems. Whether you are building a personal life assistant (L1) or a massive-scale coding agent, Macaron provides the specialized precision required for production-grade reliability.
The Macaron-V1-Preview-749B is available right now under the MIT license. You can find the complete model repository on Hugging Face, which includes the base model at the root and the specialized LoRAs organized under `l0/` through `l4/` directories. This single-repository structure simplifies deployment and fine-tuning workflows.
For those looking for a managed experience, MindLab is launching managed inference and advanced post-training capabilities on the MinT platform shortly. You can also experience the model's capabilities immediately via the live preview at macaron.im. Keep an eye out for the upcoming V1 non-preview release, which will include highly optimized 30B and 200B variants.