/Tech32d ago

Lisan al Gaib says Claude Opus 4.8 at low effort nearly matches Claude Opus 4.6 high-effort SWE-Bench Pro performance

Extra-high effort yields a 70% pass rate.

371.2K18176144.4K

#446

Original post

Lisan al Gaib@scaling01#1215inTech

we might have a GPT-5.2-xhigh situation on our hand

Opus 4.8 low thinks almost as much as Opus 4.6 high

10:08 AM · May 28, 2026 · 114.5K Views

Sentiment

Positive users celebrate Claude Opus 4.8 matching prior benchmark peaks at lower reasoning effort while negative users object to higher costs and continued reliance on burning more tokens rather than efficiency gains.

Pos

57.1%

Neg

42.9%

8 comments with sentiment.

Cluster Engagement

Digg Deeper

No Digg Deeper questions have been answered for this story yet.

Posts from X

Most Activity

VIEWS12.5KBOOKMARKS12LIKES105RETWEETS2REPLIES1

Lisan al Gaib@scaling01

this looks much better

Lisan al Gaib@scaling01

we might have a GPT-5.2-xhigh situation on our hand

Opus 4.8 low thinks almost as much as Opus 4.6 high

32d12.5K10512

Lisan al Gaib@scaling01

okay might just be the benchmark

Lisan al Gaib@scaling01

we might have a GPT-5.2-xhigh situation on our hand

Opus 4.8 low thinks almost as much as Opus 4.6 high

32d11.9K823

Florian Brand@xeophon

PB is seemingly close to being solved, so it was in fact an elicitation (and money) issue

Sadly they don't specify the harness for PB in the system card, while they do for some other benches

Lisan al Gaib@scaling01

this looks much better

32d3K193

Mert · AI Architect@MertLovesAI

@scaling01 GPT-5.5 pulled the same compression last month.

quarter the tokens, same horizon. Opus 4.8 low eating 4.6 high's lunch means the reasoning budget is now a knob, not a tier.

32d4.3K22

N.@Jardin_Acide

@scaling01 "max" seems to be regressing vs. "x-high", crazy.

32d1.3K41

kipply@kipperrii

@scaling01 the default is iso-compute to 4.7 for coding tasks :pray:

Lisan al Gaib@scaling01

we might have a GPT-5.2-xhigh situation on our hand

Opus 4.8 low thinks almost as much as Opus 4.6 high

32d2.5K120

Tejas parmar@Parmartejas

@scaling01 Antropic Solution to solving has Always Been Burn More Tokens Instead of Making Smarter Models

32d2.6K21

Vals AI@ValsAI

@scaling01 It's the new SOTA on a few of our benchmarks-

32d79831

David Brown@DaBrown95

@scaling01 Even more expensive as a consequence 🤦🏻‍♂️ I was really hoping Anthropic could match the token efficient gains OpenAI achieved with 5.5

32d1.8K8

Burito@Britoisinsane

@scaling01 cuz Opus 4.5~4.8 are small models Like 1T~2T

32d1.4K

Anakin@Twin_Sunsett

@scaling01

32d4K

Rayane@RayaneRachid_

@scaling01 max below xhigh ? tf ?

32d9721

Pseudonym 🦅@tariusdamon

@scaling01 This feels like something I can’t afford. Praying the price is improved.

32d861

𝕏isr@XisrRein

@scaling01 Mine almost never triggers thinking

32d703

Neuralease@neuralease

@scaling01 I was hoping the opposite would happen, given they have such a strong model to distill from.

But I guess enterprise cares more about raw intelligence.

32d652

BNB Godfather@BNBGodFather

@scaling01 The inference chain lengths blurring between model tiers could compress the premium pricing delta—curious if Anthropic’s token economics adjust to match.

32d517