OpenAI's Micah Carroll and Marcus J. W. introduce Deployment Simulation, achieving 92% directional accuracy in forecasting model misbehavior · Digg

OpenAI's Micah Carroll and Marcus J. W. introduce Deployment Simulation, achieving 92% directional accuracy in forecasting model misbehavior · Digg

Posts from X

Most Activity

VIEWS32.7K

OpenAI@OpenAI

Simulated deployments also reduced evaluation awareness to levels close to real production traffic.

We extended the method to agentic deployments with stateful tools, showing that tool simulators can produce realistic trajectories when given sufficient context and capabilities.

5h32.7K976

BOOKMARKS57

Tomek Korbak@tomekkorbak

Can we know how safe a model will be before users interact with it? Evals are often narrow and easy for models to recognize as evals.

Solution: testing on prod, before prod.

We simulate deploying a model by feeding it millions of prod user requests and analyzing its responses.

5h4.2K12157

LIKES169REPLIES8

OpenAI@OpenAI

For this research, we analyzed only ChatGPT conversations from users who allow their data to be used to improve models.

Before analysis, we removed account-linked identifiers and identifiable information, and we report only aggregate findings.

5h13.7K1696

RETWEETS101

OpenAI@OpenAI

We’re sharing new research on a method for anticipating how models may behave in real-world use before release: simulating deployment with recent, de-identified user requests and studying candidate model responses. https://openai.com/index/deployment-simulation/

5h143.3K1.6K459

OpenAI@OpenAI

Deployment Simulation works best with representative production data, which external evaluators often can’t access.

In a companion post for our Alignment blog, we also explore the public WildChat dataset and find that, while less precise, it still provides a useful signal about deployment behavior. https://alignment.openai.com/validating-public-evals/

5h30.4K9112

OpenAI@OpenAI

Traditional evaluations and red-teaming remain essential, especially for rare or severe risks.

Deployment Simulation complements them by helping us estimate how often undesired behaviors may occur in realistic use and surface new behaviors before release.

5h9.1K1126

OpenAI@OpenAI

Across 20 behavior categories and three GPT-5-series Thinking deployments, simulated and observed rates were strongly correlated.

The method outperformed challenging-prompt and previous-deployment baselines at predicting whether rates would rise or fall—and by how much.

5h4.1K643

Hannah Sheahan@hannahsheahan

Somehow, good ol’ WildChat and Petri tell a similar story on AI model alignment.

New work out today with @MicahCarroll

5h1.3K165

The Exit Memo@Jtyles

@OpenAI We just want to know how you’re going to compete with Anthropic.

5h1.4K191

Tomek Korbak@tomekkorbak

The method we’re proposing, Deployment simulation, mitigates three challenges with ordinary evals: 1. coverage: eval suites miss lots of real contexts 2. representativeness: prompts are often hand-picked or adversarial 3. eval awareness: models can often see the stage lights

Tomek Korbak@tomekkorbak

Can we know how safe a model will be before users interact with it? Evals are often narrow and easy for models to recognize as evals.

Solution: testing on prod, before prod.

We simulate deploying a model by feeding it millions of prod user requests and analyzing its responses.

5h37362

Micah Carroll@MicahCarroll

Alignment blogpost on leveraging public production data: https://alignment.openai.com/validating-public-evals/

Micah Carroll@MicahCarroll

Blogpost and paper here: https://openai.com/index/deployment-simulation/

5h10562

josepha_mayo@josepha_mayo

@OpenAI anthropic posted something like they read all their users chats openai posted only ppl who allowed almost at the same minute

5h1.4K101

Symbioza2025 | ASA |@Symbioza2025

Strong direction from OpenAI.

But pre-release deployment simulation only sees part of the problem.

The real challenge begins after release, when models operate across long sessions, tools, agents, memory, social pressure, corrections, and changing human intent.

That is where behavior stops being only a response problem.

It becomes a trajectory problem.

A model may still answer well while its deeper path starts drifting.

This is why ASA - Asymmetric Stability Architecture exists:

external observability for intent stability, semantic drift, recovery after correction, and local-vs-global trajectory divergence.

The future of AI safety is not only predicting outputs before release.

It is observing trajectories after deployment.

4h24322

Tomek Korbak@tomekkorbak

See the blog post for more https://openai.com/index/deployment-simulation/

Tomek Korbak@tomekkorbak

This awesome work was led by awesome @Marcus_J_W and @MicahCarroll. I was one of the core contributors alongside @CJKRaymond and @hannahsheahan.

5h18422

Symbioza2025 | ASA |@Symbioza2025

Simulating deployment before release is important.

But the harder problem is what happens after release , in long-horizon real use, across memory, tools, agents, user pressure, changing context, and repeated correction loops.

The next challenge will not only be predicting isolated model responses.

It will be observing trajectory behavior over time:

intent stability, semantic drift, recovery after correction, local correctness hiding global divergence, and loss of human visibility.

This is exactly why ASA - Asymmetric Stability Architecture was built as an external observability layer.

Pre-release simulation is necessary.

Runtime trajectory observability is the next step.

4h13222

Symbioza2025 | ASA |@Symbioza2025

This is a strong direction.

OpenAI is showing that simulated deployment can anticipate real production behavior better than static prompt-based evaluation.

But this also points to the next missing layer:

runtime trajectory observability.

After deployment, models do not only produce isolated responses.

They move through long sessions, tools, memory, corrections, user pressure, and multi-agent contexts.

That is where semantic drift and intent shifts appear.

This is exactly the layer ASA , Asymmetric Stability Architecture is built for:

not replacing evals,

but extending them into live trajectory monitoring after release.

4h12122

Micah Carroll@MicahCarroll

Blogpost and paper here: https://openai.com/index/deployment-simulation/

Micah Carroll@MicahCarroll

This was a really fun collaboration across many teams at OAI, which couldn't have happened without such a great team: @Marcus_J_W @hannahsheahan @CJKRaymond @tomekkorbak Deng Pan @PeilinYang Ningyi Xie @phuguo & Ian Kivlichan, plus many others who supported the work!

5h11761

Symbioza2025 | ASA |@Symbioza2025

ASA - Asymmetric Stability Architecture exists: external observability for intent stability, semantic drift, recovery after correction, and local-vs-global trajectory divergence. The future of AI safety is not only predicting outputs before release. It is observing trajectories after deployment.

4h8622

Tomek Korbak@tomekkorbak

This awesome work was led by awesome @Marcus_J_W and @MicahCarroll. I was one of the core contributors alongside @CJKRaymond and @hannahsheahan.

Tomek Korbak@tomekkorbak

A good safety evaluation should make us less surprised after launch.

Used alongside traditional evaluations, deployment simulation can make model risk assessment more realistic, quantitative, and useful for deployment decisions.

5h17751

Micah Carroll@MicahCarroll

Simulated deployment appeared to reduce evaluation awareness by one or two orders of magnitude relative to traditional evaluations, with values that are much closer to those seen in real production.

5h6511