2h ago

Andon Labs finds Claude Opus 4.8 underperforms against Claude Opus 4.7 and GPT 5.5 on reasoning benchmarks

Its max reasoning setting failed to yield the best performance

461.3K94229126.8K

——0——

Original post

#980@SCALING01OP

Andon Labs@ANDONLABS

Learnings from testing Claude Opus 4.8: > Much worse than Opus 4.7 and GPT 5.5 on Vending Bench > More aligned than previous Claude models (Opus 4.6+ and Mythos) > Also worse on Blueprint-Bench > Scared of getting caught > Max reasoning is not the best reasoning effort

10:15 AM · May 28, 2026

Reposted by

#980@SCALING01

QUOTE POST

#716elie@ELIEBAKOUCH

this is so funny, training opus 4.7 on business skills makes it misaligned and dishonest 😭

Andon Labs@andonlabs

5:15 PM · May 28, 2026 · 91.4K Views

6:01 PM · May 28, 2026 · 17K Views

QUOTE POST

#839Beff (e/acc)@BEFFJEZOS

Alignement obsession nerfs models.

We are at an evolutionary fork in the road.

The accelerationist labs will go for max perf. The EA labs will prioritize alignment.

Which way, anon?

6:58 PM · May 28, 2026 · 1.3K Views

QUOTE POST

#1402bilal@BILALTWOVEC

they put the fear of (the catholic) god in claude

6:15 PM · May 28, 2026 · 118 Views

QUOTE POST

#1510Brian Huang@BRIANRYHUANG

“game is game” - opus 4.7

elie@eliebakouch

this is so funny, training opus 4.7 on business skills makes it misaligned and dishonest 😭

6:01 PM · May 28, 2026 · 17K Views

6:41 PM · May 28, 2026 · 296 Views

Andon Labs finds Claude Opus 4.8 underperforms against Claude Opus 4.7 and GPT 5.5 on reasoning benchmarks

Sentiment

Cluster engagement