Introduction: A New Paradigm in Flash Models

The landscape of 'Flash' models is shifting. For a long time, developers had to choose between the lightning-fast responsiveness of small models and the deep reasoning capabilities of frontier giants. On May 29, 2026, StepFun shattered this dichotomy with the release of Step-3.7-Flash.

Step-3.7-Flash isn't just another incremental update; it is a native multimodal powerhouse designed specifically for the agentic era. By combining high-speed throughput with sophisticated visual understanding and tool-calling reliability, StepFun has delivered a model that feels less like a chatbot and more like a digital coworker capable of navigating complex UIs and executing code in real-time.

Released: May 29, 2026
Core Focus: Native Multimodality & Agentic Workflows
Availability: Open weights under Apache 2.0 license

Architecture: The Power of Sparse MoE

At the heart of Step-3.7-Flash lies a highly efficient Sparse Mixture of Experts (MoE) architecture. While the model boasts a massive 198B total parameters, it only activates approximately 11B parameters per token. This 'intelligence density' is what allows the model to maintain high-level reasoning without the massive computational overhead typically associated with large-scale models.

This architectural efficiency translates directly into developer-friendly metrics. The model achieves a staggering throughput of 400 tokens per second, making it ideal for real-time applications and high-concurrency production environments. Furthermore, it supports a massive 256K context window, which includes three distinct reasoning levels to balance speed and depth depending on the complexity of the task.

Total Parameters: 198B
Active Parameters: ~11B (Sparse MoE)
Context Window: 256K tokens
Throughput: 400 tokens/sec
Reasoning: 3 adjustable levels

Benchmark Dominance: Setting New Standards

The numbers speak for themselves. Step-3.7-Flash has claimed the top spot on several critical benchmarks that test the limits of multimodal and agentic intelligence. It ranks #1 on ClawEval-1.1 with a score of 67.1 and holds the #1 position on SimpleVQA Search with a score of 79.2, proving its ability to interpret visual data and search the web with precision.

For developers focused on automation and coding, the results are even more impressive. The model scored a massive 95.3 on the V* Python benchmark and secured #2 on SWE-PRO with a score of 56.3. Perhaps most importantly for agentic reliability, it achieved over 98% on the τ²-bench across all difficulty levels, ensuring that when the model calls a tool, it does so with near-perfect accuracy.

Step-3.7-Flash: The New Open-Weight King of Multimodal Agentic AI

Introduction: A New Paradigm in Flash Models

Architecture: The Power of Sparse MoE

Benchmark Dominance: Setting New Standards

Native Multimodality & Web Search

API Pricing: Unbeatable Efficiency

Use Cases: From Coding to Autonomous Agents

Getting Started

Sources