16h ago

Alibaba’s Qwen team releases Qwen3.7-Max, its closed-weights flagship model that scores 56.6 on the Artificial Analysis Intelligence Index after heavy reinforcement learning

It completed a 35-hour autonomous kernel optimization run with 1,158 tool calls.

4126.1K5661.1K673.3K

——2——

Original post

#1153@XEOPHONOP

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@TEORTAXESTEX

In other notes, Qwen 3.7-Max is the strongest Chinese model on CritPt, and in fact stronger than Gemini 3.5 Flash and Opus 4.6/4.7. The leap from the last generation is almost 4x, the largest I've ever seen.

10:20 AM · May 20, 2026

Reposted by

#1497@ZEPHYR_Z9

#517@ANDREWCURRAN_

#40@JEREMYPHOWARD

QUOTE POST

#259Teknium 🪽@TEKNIUM

Anyone tried in Hermes yet? New OS king?

8:27 PM · May 21, 2026 · 32.8K Views

#420Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@TEORTAXESTEX

@Teknium Bro it's not open

Teknium 🪽@Teknium

Anyone tried in Hermes yet? New OS king?

8:27 PM · May 21, 2026 · 32.8K Views

9:06 PM · May 21, 2026 · 628 Views

QUOTE POST

#980Lisan al Gaib@SCALING01

benchmarks look really good

but it still uses a ton of reasoning tokens

1:23 PM · May 21, 2026 · 6.6K Views

QUOTE POST

#1032Rohan Paul@ROHANPAUL_AI

Alibaba just released Qwen3.7-Max.

Their best flagship model built for real-world tasks and production environments.

- Agent reliability the center of the story, where the model must plan steps, call tools, inspect results, fix mistakes, and continue without collapsing after the first wrong turn.

- 56.6 on the Artificial Analysis Intelligence Index, up 4.8 points from Qwen3.6-Max. Qwen 3.7 Max sitting at 5th, pretty much on par with GPT 5.4 (xhigh)

- The Intelligence Index gains over Qwen3.6 Max Preview are concentrated in scientific reasoning, agentic capability and coding.

- One important layer of the serving stack, the inference kernel, was optimized heavily. from near-baseline speed to 10.0x geometric mean speedup after many rounds of low-level GPU optimization.

Qwen@Alibaba_Qwen

📣Meet Qwen3.7-Max — our latest flagship, made for the Agent Era. A versatile foundation for agents that actually get things done: 🧑‍💻 Coding agent, end to end. Frontend prototypes, multi-file refactors, real debugging — nails it. 🗂️ A reliable office and productivity assistant. Get your work done through MCP integrations and multi-agent orchestration. ⏱️ Long-horizon autonomy. 35 hours straight on a kernel optimization task — 1,000+ tool calls, zero hand-holding. 🔌 Scaffold-agnostic. Claude Code, OpenClaw, Qwen Code, or your own stack. Consistent reliability everywhere. API's up on Alibaba Model Studio. You can also take it for a spin on Qwen Studio. Go build something wild!🏃🏃‍♂️ 📖 Blog: https://qwen.ai/blog?id=qwen3.7 ✅ Qwen Studio: https://chat.qwen.ai/?models=qwen3.7-max ⚡️ API：https://modelstudio.console.alibabacloud.com/ap-southeast-1?tab=doc#/doc/?type=model&url=2840914_2&modelId=qwen3.7-max&serviceSite=international

1:15 PM · May 21, 2026 · 452.8K Views

5:32 PM · May 21, 2026 · 3.3K Views

QUOTE POST

#1032Rohan Paul@ROHANPAUL_AI

Qwen 3.7 Max is super close to the frontier models for coding and agentic abilities.

And and it’s now available on AI/ML API.

Agent reliability the center of the story and also on Artificial Analysis it's sitting at 5th, pretty much on par with GPT 5.4 (xhigh) and a notch above the just released Gemini 3.5 Flash.

AI/ML API is also giving away free codes for users who want to try it. see the quoted tweet.

7:17 PM · May 21, 2026 · 3.9K Views

QUOTE POST

#1496Chubby♨️@KIMMONISMUS

Alibaba released Qwen 3.7 max. Benchmarks incredible.

Their new model ran autonomously for 35 hours, made 1,158 tool calls, and achieved a 10x speedup - on a single attention kernel.

This isn't "AI improving itself across the board." It's a model grinding through compile-profile-rewrite loops on one well-defined optimization target.

Impressive? Absolutely. The kind of self-improvement people will imagine when they see the headline? Not yet.

The actually interesting claim is buried deeper: Qwen says agentic capabilities generalize from diverse training environments the same way language capabilities generalize from diverse text. If that holds, it's a bigger deal than any benchmark number.

Qwen@Alibaba_Qwen

1:15 PM · May 21, 2026 · 452.8K Views

1:31 PM · May 21, 2026 · 49.7K Views

QUOTE POST

#1946Chujie Zheng@CHUJIEZHENG

For Qwen3.7-Max, we have invested far more compute into RL training than ever before. Its top-tier AA score confirms the resulting general and agentic capabilities.

This is just the start. We will firmly push forward RL scaling to build more powerful Qwen models. Stay tuned!

10:08 AM · May 21, 2026 · 62.1K Views

Sentiment

Pos80.6%

Neg19.4%

Positive users highlight Qwen3.7-Max's strong benchmarks, 10x speedups, and rapid progress, while negative users criticize limited autonomy, weak competitor comparisons, and missing features like fine-tuning.

116 comments with sentiment.

Digg Depth

Kevin Roseasked

How did 3.7max do on coding tests vs GPT 5.5 max and Gemini 3.5 flash? Also, how fast is it, compare output of each model

Qwen3.7-Max posts 69.7 on Terminal-Bench 2.0 and 60.6 on SWE-bench Pro, trailing GPT-5.5 Max (~82.7 on Terminal-Bench) while beating Gemini 3.5 Flash on coding averages (73.6 vs 54.5). ^¹^²

It runs with 10x kernel-optimized inference after heavy tuning and edges Gemini 3.5 Flash in some head-to-head output speed tests, though Gemini claims 4x faster token generation than GPT-5.5 on agent workflows.

Cluster engagement

72 snapshots