16h ago

Alibaba’s Qwen team releases Qwen3.7-Max, its closed-weights flagship model that scores 56.6 on the Artificial Analysis Intelligence Index after heavy reinforcement learning

It completed a 35-hour autonomous kernel optimization run with 1,158 tool calls.

2
Original post

In other notes, Qwen 3.7-Max is the strongest Chinese model on CritPt, and in fact stronger than Gemini 3.5 Flash and Opus 4.6/4.7. The leap from the last generation is almost 4x, the largest I've ever seen.

10:20 AM · May 20, 2026 View on X
Reposted by

benchmarks look really good

but it still uses a ton of reasoning tokens

1:23 PM · May 21, 2026 · 6.6K Views

Alibaba just released Qwen3.7-Max.

Their best flagship model built for real-world tasks and production environments.

- Agent reliability the center of the story, where the model must plan steps, call tools, inspect results, fix mistakes, and continue without collapsing after the first wrong turn.

- 56.6 on the Artificial Analysis Intelligence Index, up 4.8 points from Qwen3.6-Max. Qwen 3.7 Max sitting at 5th, pretty much on par with GPT 5.4 (xhigh)

- The Intelligence Index gains over Qwen3.6 Max Preview are concentrated in scientific reasoning, agentic capability and coding.

- One important layer of the serving stack, the inference kernel, was optimized heavily. from near-baseline speed to 10.0x geometric mean speedup after many rounds of low-level GPU optimization.

QwenQwen@Alibaba_Qwen

📣Meet Qwen3.7-Max — our latest flagship, made for the Agent Era. A versatile foundation for agents that actually get things done: 🧑‍💻 Coding agent, end to end. Frontend prototypes, multi-file refactors, real debugging — nails it. 🗂️ A reliable office and productivity assistant. Get your work done through MCP integrations and multi-agent orchestration. ⏱️ Long-horizon autonomy. 35 hours straight on a kernel optimization task — 1,000+ tool calls, zero hand-holding. 🔌 Scaffold-agnostic. Claude Code, OpenClaw, Qwen Code, or your own stack. Consistent reliability everywhere. API's up on Alibaba Model Studio. You can also take it for a spin on Qwen Studio. Go build something wild!🏃🏃‍♂️ 📖 Blog: https://qwen.ai/blog?id=qwen3.7 ✅ Qwen Studio: https://chat.qwen.ai/?models=qwen3.7-max ⚡️ API:https://modelstudio.console.alibabacloud.com/ap-southeast-1?tab=doc#/doc/?type=model&url=2840914_2&modelId=qwen3.7-max&serviceSite=international

1:15 PM · May 21, 2026 · 452.8K Views
5:32 PM · May 21, 2026 · 3.3K Views

Qwen 3.7 Max is super close to the frontier models for coding and agentic abilities.

And and it’s now available on AI/ML API.

Agent reliability the center of the story and also on Artificial Analysis it's sitting at 5th, pretty much on par with GPT 5.4 (xhigh) and a notch above the just released Gemini 3.5 Flash.

AI/ML API is also giving away free codes for users who want to try it. see the quoted tweet.

7:17 PM · May 21, 2026 · 3.9K Views

Alibaba released Qwen 3.7 max. Benchmarks incredible.

Their new model ran autonomously for 35 hours, made 1,158 tool calls, and achieved a 10x speedup - on a single attention kernel.

This isn't "AI improving itself across the board." It's a model grinding through compile-profile-rewrite loops on one well-defined optimization target.

Impressive? Absolutely. The kind of self-improvement people will imagine when they see the headline? Not yet.

The actually interesting claim is buried deeper: Qwen says agentic capabilities generalize from diverse training environments the same way language capabilities generalize from diverse text. If that holds, it's a bigger deal than any benchmark number.

QwenQwen@Alibaba_Qwen

📣Meet Qwen3.7-Max — our latest flagship, made for the Agent Era. A versatile foundation for agents that actually get things done: 🧑‍💻 Coding agent, end to end. Frontend prototypes, multi-file refactors, real debugging — nails it. 🗂️ A reliable office and productivity assistant. Get your work done through MCP integrations and multi-agent orchestration. ⏱️ Long-horizon autonomy. 35 hours straight on a kernel optimization task — 1,000+ tool calls, zero hand-holding. 🔌 Scaffold-agnostic. Claude Code, OpenClaw, Qwen Code, or your own stack. Consistent reliability everywhere. API's up on Alibaba Model Studio. You can also take it for a spin on Qwen Studio. Go build something wild!🏃🏃‍♂️ 📖 Blog: https://qwen.ai/blog?id=qwen3.7 ✅ Qwen Studio: https://chat.qwen.ai/?models=qwen3.7-max ⚡️ API:https://modelstudio.console.alibabacloud.com/ap-southeast-1?tab=doc#/doc/?type=model&url=2840914_2&modelId=qwen3.7-max&serviceSite=international

1:15 PM · May 21, 2026 · 452.8K Views
1:31 PM · May 21, 2026 · 49.7K Views

For Qwen3.7-Max, we have invested far more compute into RL training than ever before. Its top-tier AA score confirms the resulting general and agentic capabilities.

This is just the start. We will firmly push forward RL scaling to build more powerful Qwen models. Stay tuned!

10:08 AM · May 21, 2026 · 62.1K Views
Alibaba’s Qwen team releases Qwen3.7-Max, its closed-weights flagship model that scores 56.6 on the Artificial Analysis Intelligence Index after heavy reinforcement learning · Digg