/Tech5h ago

Qwen 3.7 Max Underrated On Vals Vibe Code Bench, User Claims

438046.3K

Original post

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex#501inTech

Speaking of, I think Vals underrates Qwen 3.7 Max. It's one of the strongest Chinese models overall, but pulled down by ridiculously low Vibe Code Bench v1.1. Like, it's below its lesser open source siblings. 3.7 *Plus* gets 46.4 there. What's up?

Vals AI@ValsAI

For those looking into open weight models in light of recent news … we’ve just evaluated Kimi K2.7 Code on the Vals coding benchmarks

6:24 PM · Jun 13, 2026 · 4.3K Views

Sentiment

Users are excited about Kimi's rapid iteration gains on the Vibe Code Bench and thank ValsAI for quickly fixing the related bug.

Pos

100.0%

Neg

0.0%

3 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

VIEWS2KLIKES7

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex

Glad to have that fixed Still, VCB 1.1 is interesting in that it's probably the only eval where DSV4-Pro is straightforwardly the best Chinese open model (and best Chinese OR open model). Feels about right but I'm biased ofc.

Vals AI@ValsAI

@teortaxesTex Thanks for catching this @teortaxesTex, this was indeed a bug on our site- note that this did not affect Vals Index, and we have fixed the score on VCB: https://www.vals.ai/models/alibaba_qwen3.7-max

3h2K70

REPLIES1

Vals AI@ValsAI

3h6103

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex

@ValsAI Thanks is Nemotron similar or is it actually that low?

3h492

Sean@SomethingOnSnow

@teortaxesTex It’s wild how much kimi gains on each iteration. An opus iteration barely did anything but a kimi iteration might as well be a new gen.

5h891

WEF Chad@tweetmaster153

@teortaxesTex Vals post about the mimo models says that v2.5 ranks higher than 2.5 pro specifically on vibe code bench because 2.5 pro lacks vision

5h76

barry@Barry_lc

@teortaxesTex 确实如此

5h70