12h ago

Opus 4.8 sets a record score on BullshitBench, rebounding from version 4.7's decline in resisting sycophancy

The high scores mean the benchmark now requires harder questions

0
Original post

Top notch result from Opus 4.8 on BullshitBench, after a slight dip with 4.7. Need to start thinking of some new harder questions soon!

2:38 AM · May 29, 2026 View on X