/Tech4h ago

DARP Retrieval Method Boosts Robot Imitation Learning Up To 200%

515022.3K

Original post

Abhishek Gupta@abhishekunique7#1076inTech

How do we achieve these surprising results? Instead of learning a global, direct state-to-action mapping, DARP reparameterizes the imitation learning problem through retrieval.

We retrieve the k-nearest expert neighbors and predict actions conditioned on the relative difference vectors between those neighbor states and the query state, aggregating them for a final action prediction. (2/7)

Abhishek Gupta@abhishekunique7

Here’s a pretty weird and surprising result - retrieval-augmented generation works unreasonably well for robot learning – but only when parameterized using difference vectors!

We introduce Difference-Aware Retrieval Policies for Imitation Learning (DARP), a simple, semi-parametric RAG architecture for imitation learning that achieves gains of up to 200% over standard behavior cloning. No additional assumptions beyond BC, just a little architecture switch! The theory backing it up is pretty cool too and it works on real robots! :)

Play with our website to understand better: https://weirdlabuw.github.io/darp-site/

🧵(1/7)

10:29 AM · Jun 9, 2026 · 820 Views

/Tech4h ago

DARP Retrieval Method Boosts Robot Imitation Learning Up To 200%

515022.3K

#1076

Original post

Abhishek Gupta@abhishekunique7#1076inTech

How do we achieve these surprising results? Instead of learning a global, direct state-to-action mapping, DARP reparameterizes the imitation learning problem through retrieval.

Abhishek Gupta@abhishekunique7

Here’s a pretty weird and surprising result - retrieval-augmented generation works unreasonably well for robot learning – but only when parameterized using difference vectors!

Play with our website to understand better: https://weirdlabuw.github.io/darp-site/

🧵(1/7)

10:29 AM · Jun 9, 2026 · 820 Views

Sentiment

Sentiment building, check back later.

Cluster Engagement

Posts from X

Most Activity

VIEWS599BOOKMARKS1LIKES4REPLIES1

Abhishek Gupta@abhishekunique7

But why does DARP provide such large gains just through reparameterization? Theoretically, we show that DARP operates as an implicit manifold regularizer. By embedding neighborhood aggregation directly into the policy architecture, it achieves parameter-free Laplacian smoothing. This causes smooth, low-variance behavior, leading to more stable models and improved performance. (3/7)

Abhishek Gupta@abhishekunique7

How do we achieve these surprising results? Instead of learning a global, direct state-to-action mapping, DARP reparameterizes the imitation learning problem through retrieval.

4h59941

Abhishek Gupta@abhishekunique7

Empirically, this difference-based reparameterization directly mitigates covariate shift.

By shifting the frame of reference to local distance vectors, query states that are globally out-of-distribution (OOD) remain locally in-distribution. This local consistency grants the policy remarkable robustness during closed-loop rollouts. (4/7)

Abhishek Gupta@abhishekunique7

4h16120

Abhishek Gupta@abhishekunique7

Empirically, DARP is dead simple to implement, just a little nearest neighbors + the simple aggregation architecture - no objective change, no data change. Since DARP is easy to scale, you can drop it in for your favorite policy class! Nicely integrates with rich distribution classes like diffusion and works directly from visual inputs! You basically get pretty huge gains across the board with very few changes required. (6/7)

Abhishek Gupta@abhishekunique7

Crucially, DARP remains entirely within the standard behavior cloning regime: no simulators, interactive experts, or online training are required.

DARP sees substantial improvement across architectures and input modalities, and even significantly improves upon diffusion policies in real-world robotic manipulation tasks. (5/7)

4h31510

Abhishek Gupta@abhishekunique7

This is a particularly exciting project because it is led by our amazing undergraduate @quinncomputer . Quinn was a tour-de-force on this project, he pulled it together with very little help from us! We basically stumbled on this pretty surprising result and then spent a bunch of time trying to figure out why it worked. That resulted in some pretty cool theory, worked out by the excellent @siddhss5. Take a look at the website and the paper for more details - and use it in your work and tell us how it does!

This was work published at #ICLR2026, with @khimya, Ethan Pronovost, Paarth Shah, @siddhss5.

Website: https://weirdlabuw.github.io/darp-site/ Paper: https://arxiv.org/abs/2606.09758 Colab: https://colab.research.google.com/drive/1N0kBjaT773HkzESaXw884wmsmEpZJjEy

@quinncomputer is applying for PhD programs this year, don't miss the chance to recruit him! :)

Abhishek Gupta@abhishekunique7

4h28940

Abhishek Gupta@abhishekunique7

Crucially, DARP remains entirely within the standard behavior cloning regime: no simulators, interactive experts, or online training are required.

DARP sees substantial improvement across architectures and input modalities, and even significantly improves upon diffusion policies in real-world robotic manipulation tasks. (5/7)

Abhishek Gupta@abhishekunique7

Empirically, this difference-based reparameterization directly mitigates covariate shift.

4h14600