7h ago

Anthropic's Opus 4.8 yields a 6% gain on DeepSWE benchmarks but still trails OpenAI's GPT 5.4

The updated model reduces the average cost per task

Sentiment

Pos42.8%

Neg57.2%

Positive users praise Opus 4.8 for higher DeepSWE scores at lower cost as a real improvement, while negative users call the benchmark flawed and the model worse than alternatives or headed in the wrong direction.

33 comments with sentiment.

Anthropic's Opus 4.8 yields a 6% gain on DeepSWE benchmarks but still trails OpenAI's GPT 5.4 · Digg