Researchers introduce Simulation Distillation to pretrain world models in simulation for rapid adaptation in real-world robotic reinforcement learning

POST

#525Abhishek Gupta@ABHISHEKUNIQUE7

Punchline: distill world models from simulation to enable fast, stable real-world robot adaptation.

Simulation is nearly always wrong. But in Simulation Distillation, we ask a simple question:

How do we perform simulation pretraining such that real-world adaptation becomes trivially easy?

sim-dist.github.io

/

Let's take a closer look (1/n)

9:03 PM · May 19, 2026 · 6.5K Views

REPLY

#525Abhishek Gupta@ABHISHEKUNIQUE7

Real-world RL is still pretty hard for long-horizon, contact-rich robotics. A few minutes of robot data is not enough to relearn rewards, values, representations, dynamics, and action selection end-to-end.

And when we try, finetuning can destroy useful structure learned during pretraining. (2/n)

Abhishek Gupta@abhishekunique7

Punchline: distill world models from simulation to enable fast, stable real-world robot adaptation. Simulation is nearly always wrong. But in Simulation Distillation, we ask a simple question: How do we perform simulation pretraining such that real-world adaptation becomes trivially easy? http://sim-dist.github.io Let's take a closer look (1/n)

9:03 PM · May 19, 2026 · 6.5K Views

9:03 PM · May 19, 2026 · 548 Views

REPLY

#525Abhishek Gupta@ABHISHEKUNIQUE7

The key idea in SimDist is simple:

Move the hard RL problem into simulation, and transfer a high-coverage latent world model rather than only a policy.

In simulation, we can train this model with privileged state, dense rewards, value functions, resets, perturbations, failures, and recoveries at scale. (3/n)

Abhishek Gupta@abhishekunique7

Real-world RL is still pretty hard for long-horizon, contact-rich robotics. A few minutes of robot data is not enough to relearn rewards, values, representations, dynamics, and action selection end-to-end. And when we try, finetuning can destroy useful structure learned during pretraining. (2/n)

9:03 PM · May 19, 2026 · 548 Views

9:03 PM · May 19, 2026 · 471 Views

REPLY

#525Abhishek Gupta@ABHISHEKUNIQUE7

At deployment, we can then freeze the parts that encode task structure — the encoder, reward model, and value function — and only adapt the latent dynamics model where the simulation imperfections lie. Performance is then naturally improved, just by performing short-horizon planning in this adapted model.

So real-world adaptation becomes literally short-horizon supervised learning than long-horizon RL from scratch. (4/n)

Abhishek Gupta@abhishekunique7

The key idea in SimDist is simple: Move the hard RL problem into simulation, and transfer a high-coverage latent world model rather than only a policy. In simulation, we can train this model with privileged state, dense rewards, value functions, resets, perturbations, failures, and recoveries at scale. (3/n)

9:03 PM · May 19, 2026 · 471 Views

9:03 PM · May 19, 2026 · 254 Views

REPLY

#525Abhishek Gupta@ABHISHEKUNIQUE7

This is especially important for contact-rich tasks. The value function may already know that “peg aligned with hole” or “leg has stable foothold” is good, capturing the global structure of the task solution. What changes in the real world is often the details of local dynamics: friction, compliance, slip, contact, calibration. SimDist adapts exactly this part using real world data. (5/n)

Abhishek Gupta@abhishekunique7

At deployment, we can then freeze the parts that encode task structure — the encoder, reward model, and value function — and only adapt the latent dynamics model where the simulation imperfections lie. Performance is then naturally improved, just by performing short-horizon planning in this adapted model. So real-world adaptation becomes literally short-horizon supervised learning than long-horizon RL from scratch. (4/n)

9:03 PM · May 19, 2026 · 254 Views

9:03 PM · May 19, 2026 · 213 Views

REPLY

#525Abhishek Gupta@ABHISHEKUNIQUE7

Visualizing the value function is useful here, it transfers quite well and capture successes and failures as well as the orderings of which state is best. Importantly, this value function is trained with high-coverage, so only local repair of dynamics is needed - no OOD extrapolation errors! (6/n)

Abhishek Gupta@abhishekunique7

This is especially important for contact-rich tasks. The value function may already know that “peg aligned with hole” or “leg has stable foothold” is good, capturing the global structure of the task solution. What changes in the real world is often the details of local dynamics: friction, compliance, slip, contact, calibration. SimDist adapts exactly this part using real world data. (5/n)

9:03 PM · May 19, 2026 · 213 Views

9:03 PM · May 19, 2026 · 194 Views

REPLY

#525Abhishek Gupta@ABHISHEKUNIQUE7

One detail that matters during pre-training: Simulation data should not only contain successful expert trajectories. A planner asks counterfactual questions at test time: What if I push here? What if I slip? So the world model needs failures, perturbations, and recoveries too.

This is where simulation is super useful. We can cheaply generate the messy off-policy experience that would be painful or unsafe to collect on real robots, then distill it into a model that supports real-world planning and rapid adaptation. (7/n)

Abhishek Gupta@abhishekunique7

Visualizing the value function is useful here, it transfers quite well and capture successes and failures as well as the orderings of which state is best. Importantly, this value function is trained with high-coverage, so only local repair of dynamics is needed - no OOD extrapolation errors! (6/n)

9:03 PM · May 19, 2026 · 194 Views

9:03 PM · May 19, 2026 · 176 Views

REPLY

#525Abhishek Gupta@ABHISHEKUNIQUE7

The perspective I find useful: Simulation gives broad structure and coverage, real-world data fixes the local details. SimDist reduces adaptation to just what needs to change, while avoiding many of the extrapolation and instability issues that appear in offline-to-online RL finetuning. (9/n)

Abhishek Gupta@abhishekunique7

Across precise manipulation and quadruped locomotion, SimDist adapts from just 15-30 min of real-world data by planning with transferred reward/value structure and finetuning only dynamics. The adaptation is stable because it is supervised learning, rather than online RL over the full policy/value/reward stack. (8/n)

9:03 PM · May 19, 2026 · 186 Views

9:03 PM · May 19, 2026 · 169 Views

REPLY

#525Abhishek Gupta@ABHISHEKUNIQUE7

Across precise manipulation and quadruped locomotion, SimDist adapts from just 15-30 min of real-world data by planning with transferred reward/value structure and finetuning only dynamics.

The adaptation is stable because it is supervised learning, rather than online RL over the full policy/value/reward stack. (8/n)

Abhishek Gupta@abhishekunique7

One detail that matters during pre-training: Simulation data should not only contain successful expert trajectories. A planner asks counterfactual questions at test time: What if I push here? What if I slip? So the world model needs failures, perturbations, and recoveries too. This is where simulation is super useful. We can cheaply generate the messy off-policy experience that would be painful or unsafe to collect on real robots, then distill it into a model that supports real-world planning and rapid adaptation. (7/n)

9:03 PM · May 19, 2026 · 176 Views

9:03 PM · May 19, 2026 · 186 Views

REPLY

#525Abhishek Gupta@ABHISHEKUNIQUE7

Big kudos to Jacob Levy and @ty_westenbroek for leading this, along with Kevin Huang, Fernando Palafox, @patrickhyin , Dong-Ki Kim, Shayegan Omidshafiei, and David Fridovich-Keil.

They also made a fantastic website to help understand the work, where you can play with model behavior and performance:

Website: http://sim-dist.github.io Paper: https://arxiv.org/abs/2603.15759

See you at RSS 2026 to talk more about this work! (10/n)

Abhishek Gupta@abhishekunique7

The perspective I find useful: Simulation gives broad structure and coverage, real-world data fixes the local details. SimDist reduces adaptation to just what needs to change, while avoiding many of the extrapolation and instability issues that appear in offline-to-online RL finetuning. (9/n)

9:03 PM · May 19, 2026 · 169 Views

9:03 PM · May 19, 2026 · 203 Views

REPLY

#525Abhishek Gupta@ABHISHEKUNIQUE7

Special note to highlight that @ty_westenbroek is on the job market for research scientist roles! Hire him, he does fantastic work :)

Abhishek Gupta@abhishekunique7

Big kudos to Jacob Levy and @ty_westenbroek for leading this, along with Kevin Huang, Fernando Palafox, @patrickhyin , Dong-Ki Kim, Shayegan Omidshafiei, and David Fridovich-Keil. They also made a fantastic website to help understand the work, where you can play with model behavior and performance: Website: http://sim-dist.github.io Paper: https://arxiv.org/abs/2603.15759 See you at RSS 2026 to talk more about this work! (10/n)

9:03 PM · May 19, 2026 · 203 Views

9:03 PM · May 19, 2026 · 255 Views

REPLY

#525Abhishek Gupta@ABHISHEKUNIQUE7

@ty_westenbroek Sim2real 🤝 World Models 🤝 RL, we have all the buzzwords! 🎉🎉🎉🥳

Abhishek Gupta@abhishekunique7

Special note to highlight that @ty_westenbroek is on the job market for research scientist roles! Hire him, he does fantastic work :)

9:03 PM · May 19, 2026 · 255 Views

9:23 PM · May 19, 2026 · 207 Views

REPLY

#525Abhishek Gupta@ABHISHEKUNIQUE7

@Vikashplus Currently with a known task distribution, since it needs to get the VFs in simulation. But would be fun to get rid of that requirement :)

Vikash Kumar@Vikashplus

@abhishekunique7 Do we need to know the task distribution, or can it be trained agnostic to the distribution ahead of times?

10:03 PM · May 19, 2026 · 142 Views

10:05 PM · May 19, 2026 · 124 Views

REPLY

#1450Vikash Kumar@VIKASHPLUS

@abhishekunique7 Do we need to know the task distribution, or can it be trained agnostic to the distribution ahead of times?

Abhishek Gupta@abhishekunique7

Punchline: distill world models from simulation to enable fast, stable real-world robot adaptation. Simulation is nearly always wrong. But in Simulation Distillation, we ask a simple question: How do we perform simulation pretraining such that real-world adaptation becomes trivially easy? http://sim-dist.github.io Let's take a closer look (1/n)

9:03 PM · May 19, 2026 · 6.5K Views

10:03 PM · May 19, 2026 · 142 Views

QUOTE POST

#1456Michael Dennis@MICHAELD1729

the shortest path to general robotics could be leveraging learned world models to cross the sim2real gap

Tyler Westenbroek@ty_westenbroek

Real-world RL is still too brittle and data-hungry for long-horizon, contact-rich tasks. We introduce Simulation Distillation (SimDist), which turns large-scale simulated experience into reusable world-model priors for rapid real-world adaptation. By combining online planning with dynamics adaptation, SimDist achieves high success rates on tasks requiring precision, force, and reactivity. Play with our interactive visualization to see for yourself: https://sim-dist.github.io (1/n)

4:06 PM · May 19, 2026 · 14.8K Views

6:13 PM · May 19, 2026 · 8.9K Views

Researchers introduce Simulation Distillation to pretrain world models in simulation for rapid adaptation in real-world robotic reinforcement learning

Sentiment

Cluster engagement