5h ago

Claude Opus 4.8 launches, scoring 69.2% on SWE-Bench Pro to beat GPT-5.5 by 10 percentage points

AI Judge changed title after evaluation, original title: "Speculative benchmark tables projecting hypothetical performance for unreleased models like Claude Opus 4.8 circulate among creators"

New Dynamic Workflows coordinate parallel subagents in Claude Code.

Sentiment

Pos51.1%

Neg48.9%

Users praise Claude Opus 4.8 for new SOTA coding and reasoning benchmark scores while others call the results misleading hype that fails to match daily use and complain about higher costs.

27 comments with sentiment.