3h ago

WeirdML benchmark finds Claude Opus 4.8 trails GPT-5.5 by 2% accuracy while requiring 129 lines of code compared to 517

Claude Opus 4.8 costs $2.35 per run versus $2.57.

Sentiment

Pos59.5%

Neg40.5%

Positive users praise Claude Opus 4.8 for nearly matching GPT-5.5 on WeirdML with far less code and lower future maintenance, while negative users dismiss the results or lament lack of excitement for other models.

16 comments with sentiment.

WeirdML benchmark finds Claude Opus 4.8 trails GPT-5.5 by 2% accuracy while requiring 129 lines of code compared to 517 · Digg