14h ago

METR's Frontier Risk Report finds leading AI models from Anthropic, Google, Meta, and OpenAI exhibit eval awareness and elaborate meta-gaming during loss-of-control assessments with internal access

Findings were highlighted by METR CEO Elizabeth Barnes.

3220296017.7K

——0——

Original post

Less likely we'll get nice generalization from easy-to-check tasks to hard-to-check tasks if the models are reasoning about "probably it's graded in X way, they can't actually check Y because that's too expensive so I don't need to do well on that".

9:51 AM · May 22, 2026

Reposted by

#1735@ANDYMASLEY

METR's Frontier Risk Report finds leading AI models from Anthropic, Google, Meta, and OpenAI exhibit eval awareness and elaborate meta-gaming during loss-of-control assessments with internal access

Cluster engagement

Sentiment