3h ago

METR publishes its first Frontier Risk Report assessing whether AI labs could lose control of advanced autonomous agents after testing internal models from Anthropic, Google, Meta, and OpenAI

0

Ajeya Cotra led the project after joining METR on January 12.

Original post

Could an AI company lose control of its own agents? To find out, Anthropic, Google, Meta, and OpenAI let us (1) test their best internal models with CoT access, (2) review non-public info about capabilities, alignment, and control. The result: our first Frontier Risk Report.

11:11 AM · May 19, 2026 View on X
Reposted by

On Jan 12, I joined METR to lead writing for our first Frontier Risk Report. The last 18 weeks have been a series of wild sprints to pitch labs, negotiate contracts, analyze questionnaires, negotiate redactions, and write this thing! I'll be on TBPN at 12:30 to discuss it!

METRMETR@METR_Evals

Could an AI company lose control of its own agents? To find out, Anthropic, Google, Meta, and OpenAI let us (1) test their best internal models with CoT access, (2) review non-public info about capabilities, alignment, and control. The result: our first Frontier Risk Report.

6:11 PM · May 19, 2026 · 71.8K Views
6:12 PM · May 19, 2026 · 8.5K Views

I am very grateful to METR for the huge effort of pulling together this report!

METRMETR@METR_Evals

Could an AI company lose control of its own agents? To find out, Anthropic, Google, Meta, and OpenAI let us (1) test their best internal models with CoT access, (2) review non-public info about capabilities, alignment, and control. The result: our first Frontier Risk Report.

6:11 PM · May 19, 2026 · 71.8K Views
6:23 PM · May 19, 2026 · 3K Views

This work is the culmination of years of effort on AI evaluation science and third-party risk assessment and disclosure mechanism design. It feels like a big milestone for METR.

We designed this new procedure with an eye toward “showing by doing” how we think evaluations for the AI loss-of-control threat model should work: laying out a process that can be done periodically, not just immediately pre-deployment, and holistically assessing risk inside of an AI lab, rather than just an individual AI system.

The exercise also involved significantly deeper access than we've previously had, including raw chains-of-thought from the developers' best models and info about private model training & control protocols.

The report is long, with a bunch of new evaluation results and documentation of our process. Please, check it out, or at least the executive summary!

METRMETR@METR_Evals

Could an AI company lose control of its own agents? To find out, Anthropic, Google, Meta, and OpenAI let us (1) test their best internal models with CoT access, (2) review non-public info about capabilities, alignment, and control. The result: our first Frontier Risk Report.

6:11 PM · May 19, 2026 · 71.8K Views
6:25 PM · May 19, 2026 · 7.4K Views

I'm excited about increasing transparency of frontier labs when it comes to loss of control risks, especially as we enter the early stages of RSI. METR does a great job coordinating this.

METRMETR@METR_Evals

Could an AI company lose control of its own agents? To find out, Anthropic, Google, Meta, and OpenAI let us (1) test their best internal models with CoT access, (2) review non-public info about capabilities, alignment, and control. The result: our first Frontier Risk Report.

6:11 PM · May 19, 2026 · 71.8K Views
9:09 PM · May 19, 2026 · 14 Views

Incredible to see such thorough work done and reported in public; kudos to everyone involved, and who's working on making the field more robust based on this

METRMETR@METR_Evals

Could an AI company lose control of its own agents? To find out, Anthropic, Google, Meta, and OpenAI let us (1) test their best internal models with CoT access, (2) review non-public info about capabilities, alignment, and control. The result: our first Frontier Risk Report.

6:11 PM · May 19, 2026 · 71.8K Views
6:36 PM · May 19, 2026 · 1.5K Views

Great step towards better risk assessment and external testing!

METRMETR@METR_Evals

Could an AI company lose control of its own agents? To find out, Anthropic, Google, Meta, and OpenAI let us (1) test their best internal models with CoT access, (2) review non-public info about capabilities, alignment, and control. The result: our first Frontier Risk Report.

6:11 PM · May 19, 2026 · 71.8K Views
8:06 PM · May 19, 2026 · 615 Views

NOW: I’m on @MTSlive to talk about this!

METRMETR@METR_Evals

Could an AI company lose control of its own agents? To find out, Anthropic, Google, Meta, and OpenAI let us (1) test their best internal models with CoT access, (2) review non-public info about capabilities, alignment, and control. The result: our first Frontier Risk Report.

6:11 PM · May 19, 2026 · 71.8K Views
7:33 PM · May 19, 2026 · 1.6K Views

Overall quite excited about this report!

But I wish it had quantitative risk estimates; using vague terminology rather than probabilities could lead to incorrect impressions of the report's implications, especially if the difference between 0.01% and 1% risk might matter a ton.

I think AIs are currently low enough risk that this isn't a huge deal for this report in particular, but it would be great to establish better norms for future risk assessments.

METRMETR@METR_Evals

Could an AI company lose control of its own agents? To find out, Anthropic, Google, Meta, and OpenAI let us (1) test their best internal models with CoT access, (2) review non-public info about capabilities, alignment, and control. The result: our first Frontier Risk Report.

6:11 PM · May 19, 2026 · 71.8K Views
6:55 PM · May 19, 2026 · 979 Views

That's enough important and fascinating AI risk releases for today please. No more. Need to sleep at some point tonight.

Ajeya CotraAjeya Cotra@ajeya_cotra

On Jan 12, I joined METR to lead writing for our first Frontier Risk Report. The last 18 weeks have been a series of wild sprints to pitch labs, negotiate contracts, analyze questionnaires, negotiate redactions, and write this thing! I'll be on TBPN at 12:30 to discuss it!

6:12 PM · May 19, 2026 · 8.5K Views
6:23 PM · May 19, 2026 · 413 Views

once again, real CoTs are way more charming and diagnostic than the fake CoTs we get

Daniel FilanDaniel Filan@dfrsrchtwts

I worked on the appendices for this report! They’re long and contain lots of wild stories of model behaviour - some of my favourites in this thread. (🧵)

6:19 PM · May 19, 2026 · 4.2K Views
7:50 PM · May 19, 2026 · 498 Views

our frontier risk report contains the most serious public assessment of AI capabilities pertinent to AI R&D acceleration to date.

it also makes clear how far the evidence base is from what might be achievable in future.

METRMETR@METR_Evals

Could an AI company lose control of its own agents? To find out, Anthropic, Google, Meta, and OpenAI let us (1) test their best internal models with CoT access, (2) review non-public info about capabilities, alignment, and control. The result: our first Frontier Risk Report.

6:11 PM · May 19, 2026 · 71.8K Views
6:12 PM · May 19, 2026 · 3.3K Views

we document the internal-external capabilities gap, demonstrate AI systems' spike on “hill-climbable” tasks, investigate performance on somewhat more open-ended tasks, and much more besides.

Joel BeckerJoel Becker@joel_bkr

our frontier risk report contains the most serious public assessment of AI capabilities pertinent to AI R&D acceleration to date. it also makes clear how far the evidence base is from what might be achievable in future.

6:12 PM · May 19, 2026 · 3.3K Views
6:13 PM · May 19, 2026 · 523 Views

we have so far to go, both in terms of evidence on the level of AI capabilities today and what we might expect from AI systems in 3-12 months time.

Joel BeckerJoel Becker@joel_bkr

the capabilities evidence feeds into our risk assessment. in the end, the gap between observed capabilities we are very confident AI systems have and those we are very confident they do not have is extremely wide.

6:18 PM · May 19, 2026 · 80 Views
6:18 PM · May 19, 2026 · 84 Views

it’s going to be a remarkable year for METR.

Joel BeckerJoel Becker@joel_bkr

i could go on. a common theme is that *stronger evidence on AI R&D acceleration is possible but requires much more information.*

6:20 PM · May 19, 2026 · 133 Views
6:21 PM · May 19, 2026 · 130 Views

AI co system cards/risk reports are fine and all, but third-party risk assessments are clearly way more trustworthy. Very thoughtful work by METR.

METRMETR@METR_Evals

Could an AI company lose control of its own agents? To find out, Anthropic, Google, Meta, and OpenAI let us (1) test their best internal models with CoT access, (2) review non-public info about capabilities, alignment, and control. The result: our first Frontier Risk Report.

6:11 PM · May 19, 2026 · 71.8K Views
6:55 PM · May 19, 2026 · 359 Views

Important work! And part of a trend towards AI risk assessments being periodic, focused on all frontier models of companies, rather than just happening before a new model is deployed.

METRMETR@METR_Evals

Could an AI company lose control of its own agents? To find out, Anthropic, Google, Meta, and OpenAI let us (1) test their best internal models with CoT access, (2) review non-public info about capabilities, alignment, and control. The result: our first Frontier Risk Report.

6:11 PM · May 19, 2026 · 71.8K Views
6:29 PM · May 19, 2026 · 332 Views
METR publishes its first Frontier Risk Report assessing whether AI labs could lose control of advanced autonomous agents after testing internal models from Anthropic, Google, Meta, and OpenAI · Digg