Andon Labs finds Claude Opus 4.8 underperforms against Claude Opus 4.7 and GPT 5.5 on reasoning benchmarks
Its max reasoning setting failed to yield the best performance
this is so funny, training opus 4.7 on business skills makes it misaligned and dishonest 😭

Learnings from testing Claude Opus 4.8: > Much worse than Opus 4.7 and GPT 5.5 on Vending Bench > More aligned than previous Claude models (Opus 4.6+ and Mythos) > Also worse on Blueprint-Bench > Scared of getting caught > Max reasoning is not the best reasoning effort
Alignement obsession nerfs models.
We are at an evolutionary fork in the road.
The accelerationist labs will go for max perf. The EA labs will prioritize alignment.
Which way, anon?
they put the fear of (the catholic) god in claude
“game is game” - opus 4.7
this is so funny, training opus 4.7 on business skills makes it misaligned and dishonest 😭