/Tech22d ago

Qwen 3.7-Max Nears Opus With 60.6 on SWE-Bench Pro and Strong NL2Repo Score

376043510665.1K
Original post

Composite table for the four benchmarks where Qwen has shown both 3.6-Max (Preview) and 3.7-Max. The progress is not exactly dramatic, but it is significant for 1 month. …Except NL2Repo. Is this real? They claim to have matched Opus in the one thing Opus is hyped for.

@teortaxesTex @Elaina43114880 Am I dreaming? 60+ in SWE bench pro?

11:29 PM · May 19, 2026 · 23.1K Views
Sentiment

Many users are impressed by Qwen 3.7-Max's strong benchmark results on CritPt and similar tests because the gains show practical performance improvements and suggest Alibaba could lead the AI race.

Pos
100.0%
Neg
0.0%
11 comments with sentiment.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS38.6KBOOKMARKS57LIKES379RETWEETS30REPLIES23

In other notes, Qwen 3.7-Max is the strongest Chinese model on CritPt, and in fact stronger than Gemini 3.5 Flash and Opus 4.6/4.7. The leap from the last generation is almost 4x, the largest I've ever seen.

Naeem@identity_matrix

qwen 3.7 max scores are live on Artificial Analysis !

22dViews 38.6KLikes 379Bookmarks 57

Conflicted feeling. Binyuan and Junyang are out; I liked them, lamented their departure. But after 3.6 and 3.7, I can see Qwen as a formidable lab for the first time. And yet, Qwen 3.5 is *their* base. Their recipe. This is mostly Alibaba going ham with RL. Almost stolen valor.

Composite table for the four benchmarks where Qwen has shown both 3.6-Max (Preview) and 3.7-Max. The progress is not exactly dramatic, but it is significant for 1 month. …Except NL2Repo. Is this real? They claim to have matched Opus in the one thing Opus is hyped for.

22dViews 7KLikes 68Bookmarks 13

as an aside, > PNG 17277 × 10523 > 6,2 MB on disk why does Qwen insist on doing this?

Composite table for the four benchmarks where Qwen has shown both 3.6-Max (Preview) and 3.7-Max. The progress is not exactly dramatic, but it is significant for 1 month. …Except NL2Repo. Is this real? They claim to have matched Opus in the one thing Opus is hyped for.

22dViews 2.6KLikes 15Bookmarks 2
SK@Samking207

@teortaxesTex Alibaba might win the whole AI race, at this pace of release now every 1 month

22dViews 875Likes 16
小白脸@zhenbailian

@teortaxesTex So they altered GLM's official scores? Interesting. It was definitely not as good as the benchmarks suggested.

22dViews 661
Anime fan@badboy999654

@teortaxesTex The new leadership is cooking

22dViews 378Likes 2

@teortaxesTex They've been losing talent left and right to Xiaomi and other labs. This is totally unexpected the numbers are really really good and look promising

22dViews 330Likes 2
Christopher@Chris65536

@teortaxesTex nice, great to see Qwen getting some frontier wins. i love them based on their consumer GPU open releases -- would love to see them grab some API cash.

22dViews 471Likes 1
jayce@jayce_edits

@teortaxesTex impressive, critpt isnt very benchmaxxable

22dViews 260Likes 1
spencer merrill@spencermerrill

@teortaxesTex it's kind of insane how well this model is performing against it's priors. I'm really happy to be seeing these numbers. I need to run it on my internal advertising evals.

22dViews 809
Alpha Exponent@AlphaExponent

@teortaxesTex open models apparently accelerating

ai (ex-crypto) bros: AGI over profits

real-world economics eventually coming home to roost

ai (ex-crypto) bros: surprised pikachu face?

22dViews 670
Charuru Charuru@CharuruCha14310

@teortaxesTex Looks like they dumped gated deltanet

22dViews 194Likes 1
Thomas Ip@_thomasip

@teortaxesTex seems like Qwen's focus on frontier performance is paying off.

22dViews 451
Pang Shuo@pangshuo1981

@teortaxesTex Well said. Performance gains like this are where DeepSeek starts to feel very practical. Related to that, I am testing a unified AI workflow here: https://aevrynai.com/register?invite=uDaiK5WU

22dViews 412
Naeem@identity_matrix

@teortaxesTex It's out on artificial analysis btw

22dViews 335
noname@malikwas1f

@teortaxesTex Wowzer!

22dViews 276
Ding@zhaoxiongding

@teortaxesTex What’s critpt

22dViews 258
Load more posts