/AI4h ago

WeirdML benchmark finds Claude Opus 4.8 xhigh trails GPT-5.5 xhigh but achieves 82.9% accuracy using 129 lines of code

Disabling thinking dropped Claude's accuracy to 70.5%.

--0--
Sentiment
Sentiment unavailable for this story.
Cluster Engagement
-
Views
-
Comments
-
Reposts
-
Bookmarks
Expand data
Posts from X
Most Activity
Most ActivityTimeline
VIEWS19.3KBOOKMARKS32LIKES178RETWEETS2REPLIES22
Lisan al Gaib@scaling01

Opus 4.8-xhigh scores minimally lower than GPT-5.5-xhigh, but is absolutely simplicity-maxxing

129 LOC vs 517 LOC

I know which one I would pick

3hViews 19.3KLikes 178Bookmarks 32