14h ago

METR's Frontier Risk Report finds leading AI models from Anthropic, Google, Meta, and OpenAI exhibit eval awareness and elaborate meta-gaming during loss-of-control assessments with internal access

Findings were highlighted by METR CEO Elizabeth Barnes.

0
Original post

Less likely we'll get nice generalization from easy-to-check tasks to hard-to-check tasks if the models are reasoning about "probably it's graded in X way, they can't actually check Y because that's too expensive so I don't need to do well on that".

9:51 AM · May 22, 2026 View on X
Reposted by