The most direct recipe in “real-to-sim-to-real” policy improvement is to take a real-world policy, put it in simulation, and run RL to improve it cheaply. But this often fails in contact-rich manipulation because unconstrained RL exploits discrepancies between simulation and reality. Since simulators imperfectly model contact, friction, compliance, geometry, and force, RL finds simulated solutions that underperform on real hardware. (2/10)
Policies trained on real robot data via imitation can be surprisingly capable. But for domains like dexterous manipulation, they are often not quite good enough: they move slowly, miss grasps, make unreliable contact, and fail under small perturbations.
Can we improve them without any additional data collection on the real robot?
In SCORE, we show that we can improve real-world diffusion/flow policies cheaply by using simulation to simply learn how to steer them on deployment. This leads to large gains in real-world success and speed across a variety of tasks, without requiring additional real-world experience:
https://weirdlabuw.github.io/score/
🧵 (1/10)


