2h ago

Andon Labs finds Claude Opus 4.8 underperforms against Claude Opus 4.7 and GPT 5.5 on reasoning benchmarks

Its max reasoning setting failed to yield the best performance

0
Original post

Learnings from testing Claude Opus 4.8: > Much worse than Opus 4.7 and GPT 5.5 on Vending Bench > More aligned than previous Claude models (Opus 4.6+ and Mythos) > Also worse on Blueprint-Bench > Scared of getting caught > Max reasoning is not the best reasoning effort

10:15 AM · May 28, 2026 View on X
Reposted by

this is so funny, training opus 4.7 on business skills makes it misaligned and dishonest 😭

Andon LabsAndon Labs@andonlabs

Learnings from testing Claude Opus 4.8: > Much worse than Opus 4.7 and GPT 5.5 on Vending Bench > More aligned than previous Claude models (Opus 4.6+ and Mythos) > Also worse on Blueprint-Bench > Scared of getting caught > Max reasoning is not the best reasoning effort

5:15 PM · May 28, 2026 · 91.4K Views
6:01 PM · May 28, 2026 · 17K Views

Alignement obsession nerfs models.

We are at an evolutionary fork in the road.

The accelerationist labs will go for max perf. The EA labs will prioritize alignment.

Which way, anon?

6:58 PM · May 28, 2026 · 1.3K Views

they put the fear of (the catholic) god in claude

6:15 PM · May 28, 2026 · 118 Views

“game is game” - opus 4.7

elieelie@eliebakouch

this is so funny, training opus 4.7 on business skills makes it misaligned and dishonest 😭

6:01 PM · May 28, 2026 · 17K Views
6:41 PM · May 28, 2026 · 296 Views