/Tech3h ago

AI developer @teortaxesTex says Claude Sonnet 5.0 underperforms Sonnet 4.6 on non-target benchmarks, sparking 'benchnerfing' debates

Story Overview

Anthropic rolled out Claude Sonnet 5 as its most capable agentic model yet, touting stronger reasoning and tool use than Sonnet 4.6 at a lower price point, but independent checks quickly surfaced regressions on benchmarks that fall outside the usual optimization targets.

42257111010.6K

#501

Original post

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex#501inTech

The one bench nobody wants to hill-climb

banteg@banteg

welcome to benchnerfing era, sonnet 5 weaker than sonnet 4.6

11:18 AM · Jun 30, 2026 · 271 Views

Open Question

Performance gaps invite fresh scrutiny

A vocal AI developer highlighted weaker scores versus the previous Sonnet on non-targeted tests, prompting others to ask whether the new model even clears the bar set by GLM-5.2.

Developer Impact

Version jump fuels naming doubts

Some commentators argue the incremental gains do not justify the 5.0 label and might better suit a point release like 4.8, keeping the conversation focused on what actually moved forward.

Sentiment

Negative users slammed Anthropic's Claude Sonnet 5 launch as a nerfed downgrade that delivers worse performance and higher cost per task than prior versions like Sonnet 4.6.

Pos

0.0%

Neg

100.0%

15 comments with sentiment.

Cluster Engagement

Digg Deeper

No Digg Deeper questions have been answered for this story yet.

Posts from X

Most Activity

VIEWS7.1KBOOKMARKS7LIKES160RETWEETS8REPLIES15

Lisan al Gaib@scaling01

I knew it was going to be an insane nothingburger, because there's currently a soft ban on frontier capabilities

but I genuinely don't understand why they didn't call it Sonnet 4.8 or Sonnet 4.9, because this artificially nerfed piece of shit is not worthy of the 5.0 naming

2h7.1K1607

Lisan al Gaib@scaling01

Claude 5 has so far been the worst launch by Anthropic

Fable 5 isn't available and Sonnet 5 was nerfed to death

2h1.5K703

Lisan al Gaib@scaling01

like does it even beat GLM-5.2?

Lisan al Gaib@scaling01

I knew it was going to be an insane nothingburger, because there's currently a soft ban on frontier capabilities

but I genuinely don't understand why they didn't call it Sonnet 4.8 or Sonnet 4.9, because this artificially nerfed piece of shit is not worthy of the 5.0 naming

2h1.7K250

Lisan al Gaib@scaling01

turns out it does (barely) at like 3x the price

2h1.2K121

Lincoln 🇿🇦@Presidentlin

@0xVita @scaling01 One of these days, I need to watch these movies.

2h1631

Jacob Centner@JacobCentner

@scaling01 feels like haiku-5 instead, I think they should have gone with that

roughly matching sonnet-4.6 perf on medium for half the cost is cool, lot of enterprises are going to love that

2h73811

wetbrain@0xVita

@scaling01 Thank god the mahdi is here to tell us the truth we @Presidentlin were worried you got hit by a bus when you didn’t break the news first

2h5364

💺@patience_cave

@scaling01 i think you mean best launch?

2h1213

Andrew Rivers@itsandrewrivers

@scaling01 We have entered the age of artificially-limited frontier model regression.

We can thank the gov for that.

US is no longer going to be a safe haven for rapid AI innovation, unfortunately.

2h5154

Irving@ieqr_

@scaling01 I agree, it was an enormous letdown. It doesn't reach Opus capabilities and it's in the opposite Pareto frontier quadrant, the worst quadrant in token efficiency at least for the benchmarks

2h117

Luigi Pagani@Luigi1549898

@scaling01 This is probably haiku size, Opus 4.6 was the original Sonnet 5. They are just greedy

2h2343

AYLI@yanyan9_A

@teortaxesTex Liang Wenfeng if you can hear me obliterate the evil west V4.1 PRO MAX.finalversion to put an end to this charrade

3h594

Habanero@singularityHSN

@scaling01 then people would ask for sonnet 5… best they just pump out this filler model now and focus their attention on the big guns again, who cares about anything other than the SOTA

2h8151