4h ago

Claude Opus 4.8 achieves 58% Pass@1 on the DeepSWE coding benchmark, trailing GPT-5.5 but leading on cost efficiency

Its score trails its 69.2% result on SWE-Bench.

Sentiment

Pos100%

Neg0%

Users express trust in the DeepSWE benchmark for aligning with their qualitative experiences and real-world performance of Claude Opus 4.8 ranking behind GPT-5.5.

6 comments with sentiment.

Claude Opus 4.8 achieves 58% Pass@1 on the DeepSWE coding benchmark, trailing GPT-5.5 but leading on cost efficiency · Digg