/AI8h ago

Opus 4.7 And GPT-5.5 Reach Nearly 90% On Agentic WeirdML

--0--
Quote posts
Reposts
Original postFlorian Brand#1153

I ran Opus 4.7 and gpt-5.5 on an agentic version of WeirdML. The models improved significantly (both scored almost 90%), especially Opus (which started from a lower base).

They had full access to the training data in a sandbox, but still had to submit code 5 times to be scored like regular WeirdML.

They achieved the higher score mostly by more consistently scoring really well on each task, not (mostly) by improving the SOTA on each task. For more details, see the Agentic WeirdML page on the website (link in thread).

1:48 AM · May 31, 2026 · 5.5K Views
Sentiment
Sentiment unavailable for this story.
Cluster Engagement
-
Views
-
Comments
-
Reposts
-
Bookmarks
Expand data
Posts from X
Most Activity
Most ActivityTimeline
No ranked X posts are available for this story yet.