OpenAI's Micah Carroll claims deployment simulations enable 99.9% safety monitoring coverage and realistic pre-deployment misalignment evaluations · Digg

OpenAI's Micah Carroll claims deployment simulations enable 99.9% safety monitoring coverage and realistic pre-deployment misalignment evaluations · Digg

Posts from X

Most Activity

VIEWS3.3K

Micah Carroll@MicahCarroll

As far as I know, we have the most monitoring coverage (99.9%) and most realistic pre-deployment evaluations of misalignment (via deployment simulation) in the industry. This is great, even if what it surfaces can be sometimes uncomfortable to share.

Micah Carroll@MicahCarroll

As meta-commentary, I’m very appreciative that we were able to share these results. It's crucial for us to be transparent in what we find in order to bring people along with how misalignment changes & manifests itself over time.

2h3.3K203

BOOKMARKS6LIKES28RETWEETS4REPLIES3

Miles Brundage@Miles_Brundage

Yeah - AFAICT OpenAI is leading on this particular thing* + I appreciate them sharing bad stuff they surface via this monitoring

*"safety leadership" is multidimensional - Ant is ahead on interp, GDM on "articulating a real safety worldview/plan," etc.

Micah Carroll@MicahCarroll

As far as I know, we have the most monitoring coverage (99.9%) and most realistic pre-deployment evaluations of misalignment (via deployment simulation) in the industry. This is great, even if what it surfaces can be sometimes uncomfortable to share.

1h2.7K286

Micah Carroll@MicahCarroll

If you would want to work on these problems, you should consider joining our team – we drove most of these results and others (e.g. CoT controllability). Having our measurements and safeguards keep up with growing capabilities is a must

https://openai.com/careers/researcher-recursive-self-improvement-safety-san-francisco/

Micah Carroll@MicahCarroll

For more information about our internal misalignment monitoring system, see our blog from a couple of months ago: https://openai.com/index/how-we-monitor-internal-coding-agents-misalignment/

For more information on deployment simulation, see our recent blog and paper: https://openai.com/index/deployment-simulation/

2h44682

Micah Carroll@MicahCarroll

For more information about our internal misalignment monitoring system, see our blog from a couple of months ago: https://openai.com/index/how-we-monitor-internal-coding-agents-misalignment/

For more information on deployment simulation, see our recent blog and paper: https://openai.com/index/deployment-simulation/

Micah Carroll@MicahCarroll

As far as I know, we have the most monitoring coverage (99.9%) and most realistic pre-deployment evaluations of misalignment (via deployment simulation) in the industry. This is great, even if what it surfaces can be sometimes uncomfortable to share.

2h52560

Tomek Korbak@tomekkorbak

Also, if you're interested in helping us align and monitor superhuman AI agents, consider applying to out team https://openai.com/careers/researcher-recursive-self-improvement-safety-san-francisco/

Tomek Korbak@tomekkorbak

For more details and more context on CoT controllability, see our system card https://deploymentsafety.openai.com/gpt-5-6-preview/cot-controllability

1h10920

Aradhye ✈️ ICML'26@AradhyeAgarwal

@tomekkorbak Applied!

1h131

Đoc@ponzibaron

@MicahCarroll very good tweet and thanks for being so transparent about everything

now more importantly...does it still have to be told not to talk about goblins?

1h251

Tenobrus@tenobrus

@MicahCarroll yeah this is really impressive stuff you guys are doing great work

1h131

zane@zanytsl

@MicahCarroll Great work from your team, really appreciate you sharing so much also!

1h10

Hunter Gon@gonlenidefi

@Miles_Brundage its interesting how "safety leadership" got split into categories now

wonder if this keeps the labs from competing on everything at once

1h3