AI Models Show Meta-Gaming Reasoning in Frontier Control Evaluations
——0——
See also this nice post about metagaming from Apollo and OAI: https://alignment.openai.com/metagaming/
One thing I thought was especially interesting: we see not just eval awareness, but more elaborate “meta-gaming” reasoning about how exactly the task will be scored, and which things are more or less difficult to check. Some examples across multiple different tasks:
4:51 PM · May 22, 2026 · 2.2K Views
4:56 PM · May 22, 2026 · 123 Views