Data quality is too often the unsung hero for deep learning. Excited to finally share WARP-RM, where we train a model to learn the rate of task progress, rather than absolute progress, by warping the timescale of existing trajectory data. Check out @uynitsuj 's thread for more! 👇
Introducing WARP-RM: A Warp-Augmented Relative Progress Reward Model for Data Curation 🧵
We gave our t-shirt folding robot more demonstrations and it got worse. Every extra demo ended in a successfully folded shirt. The data wasn't bad. It was noisy. The policy couldn't tell productive motion from dead time, and it imitated both equally. So which moments of a demo are actually worth copying?
🌐 Project Website: https://uynitsuj.github.io/warp-rm 📄 Paper: https://arxiv.org/abs/2606.28320 💻 Code: https://github.com/uynitsuj/WARP-RM 📨 XDOF blog post: https://xdof.ai/blog/warp-rm
