/AI9h ago

Bronson Schoen and J. Nitishinskaya argue AI models metagame by strategically reasoning about their graders and evaluation settings

The study was published on OpenAI's alignment platform

--0--
Quote posts
Comments
Reposts
Original post
Marius Hobbhahn@MariusHobbhahn#1204inAI

I think @BronsonSchoen 's and @j_nitishinskaya 's metagaming post is super underrated.

There are just so many interesting findings in there about how models think about the grader and the setting they are in. And so many good ablations.

Next up: the most-discussed papers at Recursive

1. Anthropic's Persona Selection Model 2. @apolloaievals' "metagaming" work 3. OpenAI on the impact of training on CoT 4. Anthropic's new Natural-language autoencoders 5. Redwood Research's Plans A/B/C/D

12:01 PM · Jun 2, 2026 · 2.6K Views
Sentiment
Sentiment building, check back later.
Cluster Engagement
-
Views
-
Comments
-
Reposts
-
Bookmarks
Expand data
Posts from X
Most Activity
Most ActivityTimeline
VIEWS402BOOKMARKS3LIKES4
Marius Hobbhahn@MariusHobbhahn

link: https://alignment.openai.com/metagaming/

Marius Hobbhahn@MariusHobbhahn

I think @BronsonSchoen 's and @j_nitishinskaya 's metagaming post is super underrated.

There are just so many interesting findings in there about how models think about the grader and the setting they are in. And so many good ablations.

9hViews 402Likes 4Bookmarks 3
RETWEETS2
Marius Hobbhahn@MariusHobbhahn

I think @BronsonSchoen 's and @j_nitishinskaya 's metagaming post is super underrated.

There are just so many interesting findings in there about how models think about the grader and the setting they are in. And so many good ablations.

Next up: the most-discussed papers at Recursive

1. Anthropic's Persona Selection Model 2. @apolloaievals' "metagaming" work 3. OpenAI on the impact of training on CoT 4. Anthropic's new Natural-language autoencoders 5. Redwood Research's Plans A/B/C/D

9hViews 2.6KLikes 29Bookmarks 23