/Tech22d ago

Qwen 3.7-Max Nears Opus With 60.6 on SWE-Bench Pro and Strong NL2Repo Score

376043510665.1K

Original post

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex#501inTech

Composite table for the four benchmarks where Qwen has shown both 3.6-Max (Preview) and 3.7-Max. The progress is not exactly dramatic, but it is significant for 1 month. …Except NL2Repo. Is this real? They claim to have matched Opus in the one thing Opus is hyped for.

Nid All | AI art and more 🤖@n1d_all

@teortaxesTex @Elaina43114880 Am I dreaming? 60+ in SWE bench pro?

11:29 PM · May 19, 2026 · 23.1K Views

/Tech22d ago

Qwen 3.7-Max Nears Opus With 60.6 on SWE-Bench Pro and Strong NL2Repo Score

376043510665.1K

#501

Original post

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex#501inTech

Nid All | AI art and more 🤖@n1d_all

@teortaxesTex @Elaina43114880 Am I dreaming? 60+ in SWE bench pro?

11:29 PM · May 19, 2026 · 23.1K Views

Sentiment

Many users are impressed by Qwen 3.7-Max's strong benchmark results on CritPt and similar tests because the gains show practical performance improvements and suggest Alibaba could lead the AI race.

Pos

100.0%

Neg

0.0%

11 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

VIEWS38.6KBOOKMARKS57LIKES379RETWEETS30REPLIES23

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex

In other notes, Qwen 3.7-Max is the strongest Chinese model on CritPt, and in fact stronger than Gemini 3.5 Flash and Opus 4.6/4.7. The leap from the last generation is almost 4x, the largest I've ever seen.

Naeem@identity_matrix

qwen 3.7 max scores are live on Artificial Analysis !

22d38.6K37957

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex

Conflicted feeling. Binyuan and Junyang are out; I liked them, lamented their departure. But after 3.6 and 3.7, I can see Qwen as a formidable lab for the first time. And yet, Qwen 3.5 is *their* base. Their recipe. This is mostly Alibaba going ham with RL. Almost stolen valor.

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex

22d7K6813

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex

impressively, it does so in fewer tokens than V4-Pro Alibaba has caught up in hard reasoning? no idea about costs

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex

22d2.7K282

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex

as an aside, > PNG 17277 × 10523 > 6,2 MB on disk why does Qwen insist on doing this?

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex

22d2.6K152

SK@Samking207

@teortaxesTex Alibaba might win the whole AI race, at this pace of release now every 1 month

22d87516

小白脸@zhenbailian

@teortaxesTex So they altered GLM's official scores? Interesting. It was definitely not as good as the benchmarks suggested.

22d661

Anime fan@badboy999654

@teortaxesTex The new leadership is cooking

22d3782

Saleh Abdulaziz@Sal7one

@teortaxesTex They've been losing talent left and right to Xiaomi and other labs. This is totally unexpected the numbers are really really good and look promising

22d3302

Christopher@Chris65536

@teortaxesTex nice, great to see Qwen getting some frontier wins. i love them based on their consumer GPU open releases -- would love to see them grab some API cash.

22d4711

jayce@jayce_edits

@teortaxesTex impressive, critpt isnt very benchmaxxable

22d2601

spencer merrill@spencermerrill

@teortaxesTex it's kind of insane how well this model is performing against it's priors. I'm really happy to be seeing these numbers. I need to run it on my internal advertising evals.

22d809

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex

@zhenbailian nah it's from qwen 3.6 blog post they revised it down in 3.7, not sure why

22d702

Alpha Exponent@AlphaExponent

@teortaxesTex open models apparently accelerating

ai (ex-crypto) bros: AGI over profits

real-world economics eventually coming home to roost

ai (ex-crypto) bros: surprised pikachu face?

22d670

Charuru Charuru@CharuruCha14310

@teortaxesTex Looks like they dumped gated deltanet

22d1941

Thomas Ip@_thomasip

@teortaxesTex seems like Qwen's focus on frontier performance is paying off.

22d451

CK 🏴‍☠️@cyprianpl

@teortaxesTex What's the pricing leap?

22d422

Pang Shuo@pangshuo1981

@teortaxesTex Well said. Performance gains like this are where DeepSeek starts to feel very practical. Related to that, I am testing a unified AI workflow here: https://aevrynai.com/register?invite=uDaiK5WU

22d412

Naeem@identity_matrix

@teortaxesTex It's out on artificial analysis btw

22d335

noname@malikwas1f

@teortaxesTex Wowzer!

22d276

Ding@zhaoxiongding

@teortaxesTex What’s critpt

22d258