The one bench nobody wants to hill-climb
welcome to benchnerfing era, sonnet 5 weaker than sonnet 4.6
Anthropic rolled out Claude Sonnet 5 as its most capable agentic model yet, touting stronger reasoning and tool use than Sonnet 4.6 at a lower price point, but independent checks quickly surfaced regressions on benchmarks that fall outside the usual optimization targets.
The one bench nobody wants to hill-climb
welcome to benchnerfing era, sonnet 5 weaker than sonnet 4.6
A vocal AI developer highlighted weaker scores versus the previous Sonnet on non-targeted tests, prompting others to ask whether the new model even clears the bar set by GLM-5.2.
Some commentators argue the incremental gains do not justify the 5.0 label and might better suit a point release like 4.8, keeping the conversation focused on what actually moved forward.
Negative users slammed Anthropic's Claude Sonnet 5 launch as a nerfed downgrade that delivers worse performance and higher cost per task than prior versions like Sonnet 4.6.
No Digg Deeper questions have been answered for this story yet.
I knew it was going to be an insane nothingburger, because there's currently a soft ban on frontier capabilities
but I genuinely don't understand why they didn't call it Sonnet 4.8 or Sonnet 4.9, because this artificially nerfed piece of shit is not worthy of the 5.0 naming
Claude 5 has so far been the worst launch by Anthropic
Fable 5 isn't available and Sonnet 5 was nerfed to death
like does it even beat GLM-5.2?
I knew it was going to be an insane nothingburger, because there's currently a soft ban on frontier capabilities
but I genuinely don't understand why they didn't call it Sonnet 4.8 or Sonnet 4.9, because this artificially nerfed piece of shit is not worthy of the 5.0 naming

turns out it does (barely) at like 3x the price

@0xVita @scaling01 One of these days, I need to watch these movies.

@scaling01 feels like haiku-5 instead, I think they should have gone with that
roughly matching sonnet-4.6 perf on medium for half the cost is cool, lot of enterprises are going to love that

@scaling01 Thank god the mahdi is here to tell us the truth we @Presidentlin were worried you got hit by a bus when you didn’t break the news first

@scaling01 i think you mean best launch?

@scaling01 We have entered the age of artificially-limited frontier model regression.
We can thank the gov for that.
US is no longer going to be a safe haven for rapid AI innovation, unfortunately.

@scaling01 I agree, it was an enormous letdown. It doesn't reach Opus capabilities and it's in the opposite Pareto frontier quadrant, the worst quadrant in token efficiency at least for the benchmarks

@scaling01 This is probably haiku size, Opus 4.6 was the original Sonnet 5. They are just greedy

@teortaxesTex Liang Wenfeng if you can hear me obliterate the evil west V4.1 PRO MAX.finalversion to put an end to this charrade

@scaling01 then people would ask for sonnet 5… best they just pump out this filler model now and focus their attention on the big guns again, who cares about anything other than the SOTA

@scaling01 I opened up the system card, saw this, and stopped giving a fuck, lol

@scaling01 by a hair

@scaling01 What do you mean nerfed? Safeguards or just generally a weak model?

@scaling01 yeah, i'm fully disconnected from hype now having a great time, zooming through 31 tasks in 1 minute with open source models

@scaling01 Sounds like they gave the model a participation trophy and called it a “nothingburger” for extra flavor.

@scaling01 sad sonnet hyms

@scaling01 new world