Now, SCORE is not magic - there are clear limitations. Real-world policy can be improved by choosing better behaviors inside its support, but it cannot create behaviors that were never present in the data, making it reliant on a level of base policy coverage and capabilities. (9/10)
What is pretty cool is that SCORE does not require perfect (or even near-perfect) base policies for successful improvement. What matters is coverage of the base-policy: failures, recoveries, and play data can all expand what the real-world policy is able to do.
Even if the base policy does not use these behaviors reliably for zero-shot success, SCORE can learn to steer towards them in simulation.
Some of our most robust policies came not from cleaner datasets, but from broader ones! (8/10)
