4h agoClaude Opus 4.8 achieves 58% Pass@1 on the DeepSWE coding benchmark, trailing GPT-5.5 but leading on cost efficiencyIts score trails its 69.2% result on SWE-Bench.SentimentSentimentPos100%Neg0%Users express trust in the DeepSWE benchmark for aligning with their qualitative experiences and real-world performance of Claude Opus 4.8 ranking behind GPT-5.5.6 comments with sentiment. View comments.