7h agoAnthropic's Opus 4.8 yields a 6% gain on DeepSWE benchmarks but still trails OpenAI's GPT 5.4The updated model reduces the average cost per taskSentimentSentimentPos42.8%Neg57.2%Positive users praise Opus 4.8 for higher DeepSWE scores at lower cost as a real improvement, while negative users call the benchmark flawed and the model worse than alternatives or headed in the wrong direction.33 comments with sentiment. View comments.