WeirdML benchmark finds Claude Opus 4.8 trails GPT-5.5 by 2% accuracy while requiring 129 lines of code compared to 517 · Digg