6h ago

Opus 4.8 Shows No Proofreading Gains On ErrataBench Over 4.7

0
Original post

ok, here are the Opus 4.8 results for ErrataBench, a benchmark that measures how well LLMs can find and fix errors in English text. Some highlights: 1. Proofreading Quality - no improvement over Opus 4.7 and still lagging behind GPT. Performs the best with no reasoning.

1:11 PM · May 29, 2026 View on X