StepFun releases Step 3.7 Flash, an open-weight 198B MoE model that runs at 400 tokens per second
The Apache 2.0 model uses 11 billion active parameters
I've been waiting for this! They managed to do it before June, and they open sourced it right away! @antirez I've been saying. Look at this model. It's much smaller than V4-Flash, it's multimodal, it's fast. It deserves to be added.
⚡️ Step 3.7 Flash is here: The new frontier is agent efficiency. #1 ClawEval-1.1 (67.1), #1 SimpleVQA Search (79.2), #2 SWE-PRO (56.3), 95.3 on V* Python. Open weights under Apache 2.0. Built for agentic, coding, search, and multimodal workflows — balancing speed, cost, and reliable execution. - 400 TPS. 198B sparse MoE, ~11B active. 256K context, 3 reasoning levels. - Understands UIs, charts, docs, images — then writes code or calls tools to act on what it sees. - Web + visual search reaches further: more sources, deeper follow-up. - Reliable tool use — less drift, fewer broken toolcalls. 98%+ on τ²-bench across all difficulty levels. - Works with Claude Code, KiloCode, Hermes Agent, OpenClaw, and protocols like MCP. - Runs locally on Mac Studio M4 Max, DGX Spark, AMD AI Max+ 395. GitHub: http://github.com/stepfun-ai/Step-3.7-Flash HuggingFace: http://huggingface.co/stepfun-ai/Step-3.7-Flash GGUF: http://huggingface.co/stepfun-ai/Step-3.7-Flash-GGUF ModelScope: http://modelscope.cn/models/stepfun-ai/Step-3.7-Flash API: http://platform.stepfun.ai Blog: http://static.stepfun.com/blog/step-3.7-flash/
Step 3.7 is generally neck and neck with V4-Flash (which is underappreciated as a powerful agent), I think they targeted it. But it goes to show that vision is a must now. V4.1V can't come soon enough.

First impressions: StepFun 3.7 vision is kinda low-res and hallucinatory, behind MiMo 2.5. Kimi >> DS-Vision > MiMo > StepFun. Well, it's their first vision in the Flash series and this is by far the smallest and fastest model. But no bueno.