SCORE Improves Robot Policies Using Broad Suboptimal Data Coverage

Original post

Abhishek Gupta@abhishekunique7#519inTech

Now, SCORE is not magic - there are clear limitations. Real-world policy can be improved by choosing better behaviors inside its support, but it cannot create behaviors that were never present in the data, making it reliant on a level of base policy coverage and capabilities. (9/10)

Abhishek Gupta@abhishekunique7

What is pretty cool is that SCORE does not require perfect (or even near-perfect) base policies for successful improvement. What matters is coverage of the base-policy: failures, recoveries, and play data can all expand what the real-world policy is able to do.

Even if the base policy does not use these behaviors reliably for zero-shot success, SCORE can learn to steer towards them in simulation.

Some of our most robust policies came not from cleaner datasets, but from broader ones! (8/10)

12:16 PM · Jul 2, 2026 · 103 Views

VIEWS174LIKES2REPLIES1

Abhishek Gupta@abhishekunique7

This project worked surprisingly well, and was a huge amount of hard work by @yu_raymond5 and @willhuey9. No matter what task I threw at them, they got it to work - really incredible work! And it really works surprisingly well, we highly recommend you try it out.

This was joint work with @mukadammh and Anusha Nagabandi at Amazon!

Website (lots of fun videos!): https://weirdlabuw.github.io/score/ Paper: https://arxiv.org/abs/2606.27475

(10/10)

Abhishek Gupta@abhishekunique7

3h17420

Abhishek Gupta@abhishekunique7

I also want to shout out some related work from friends over at RAI - ExpertGen (https://pages.rai-inst.com/expertgen/) that investigates related ideas! :)

3h48