WeirdML benchmark finds Claude Opus 4.8 xhigh trails GPT-5.5 xhigh but achieves 82.9% accuracy using 129 lines of code · Digg