How do we achieve these surprising results? Instead of learning a global, direct state-to-action mapping, DARP reparameterizes the imitation learning problem through retrieval.
We retrieve the k-nearest expert neighbors and predict actions conditioned on the relative difference vectors between those neighbor states and the query state, aggregating them for a final action prediction. (2/7)
Here’s a pretty weird and surprising result - retrieval-augmented generation works unreasonably well for robot learning – but only when parameterized using difference vectors!
We introduce Difference-Aware Retrieval Policies for Imitation Learning (DARP), a simple, semi-parametric RAG architecture for imitation learning that achieves gains of up to 200% over standard behavior cloning. No additional assumptions beyond BC, just a little architecture switch! The theory backing it up is pretty cool too and it works on real robots! :)
Play with our website to understand better: https://weirdlabuw.github.io/darp-site/
🧵(1/7)