/AI3h ago

DARP Retrieval Method Boosts Robot Imitation Learning Up To 200%

1310217848.3K
Original post
Abhishek Gupta@abhishekunique7#529inAI

Here’s a pretty weird and surprising result - retrieval-augmented generation works unreasonably well for robot learning – but only when parameterized using difference vectors!

We introduce Difference-Aware Retrieval Policies for Imitation Learning (DARP), a simple, semi-parametric RAG architecture for imitation learning that achieves gains of up to 200% over standard behavior cloning. No additional assumptions beyond BC, just a little architecture switch! The theory backing it up is pretty cool too and it works on real robots! :)

Play with our website to understand better: https://weirdlabuw.github.io/darp-site/

🧵(1/7)

10:29 AM · Jun 9, 2026 · 5.7K Views
Sentiment

Users are excited about the DARP Retrieval Method boosting robot imitation learning up to 200% because of its impressive results and strong undergraduate leadership on the project.

Pos
100.0%
Neg
0.0%
2 comments with sentiment.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS803
Abhishek Gupta@abhishekunique7

How do we achieve these surprising results? Instead of learning a global, direct state-to-action mapping, DARP reparameterizes the imitation learning problem through retrieval.

We retrieve the k-nearest expert neighbors and predict actions conditioned on the relative difference vectors between those neighbor states and the query state, aggregating them for a final action prediction. (2/7)

Abhishek Gupta@abhishekunique7

Here’s a pretty weird and surprising result - retrieval-augmented generation works unreasonably well for robot learning – but only when parameterized using difference vectors!

We introduce Difference-Aware Retrieval Policies for Imitation Learning (DARP), a simple, semi-parametric RAG architecture for imitation learning that achieves gains of up to 200% over standard behavior cloning. No additional assumptions beyond BC, just a little architecture switch! The theory backing it up is pretty cool too and it works on real robots! :)

Play with our website to understand better: https://weirdlabuw.github.io/darp-site/

🧵(1/7)

3hViews 803Likes 4Bookmarks 1
BOOKMARKS1LIKES4REPLIES1
Abhishek Gupta@abhishekunique7

But why does DARP provide such large gains just through reparameterization? Theoretically, we show that DARP operates as an implicit manifold regularizer. By embedding neighborhood aggregation directly into the policy architecture, it achieves parameter-free Laplacian smoothing. This causes smooth, low-variance behavior, leading to more stable models and improved performance. (3/7)

Abhishek Gupta@abhishekunique7

How do we achieve these surprising results? Instead of learning a global, direct state-to-action mapping, DARP reparameterizes the imitation learning problem through retrieval.

We retrieve the k-nearest expert neighbors and predict actions conditioned on the relative difference vectors between those neighbor states and the query state, aggregating them for a final action prediction. (2/7)

3hViews 587Likes 4Bookmarks 1
Abhishek Gupta@abhishekunique7

Empirically, this difference-based reparameterization directly mitigates covariate shift.

By shifting the frame of reference to local distance vectors, query states that are globally out-of-distribution (OOD) remain locally in-distribution. This local consistency grants the policy remarkable robustness during closed-loop rollouts. (4/7)

Abhishek Gupta@abhishekunique7

But why does DARP provide such large gains just through reparameterization? Theoretically, we show that DARP operates as an implicit manifold regularizer. By embedding neighborhood aggregation directly into the policy architecture, it achieves parameter-free Laplacian smoothing. This causes smooth, low-variance behavior, leading to more stable models and improved performance. (3/7)

3hViews 156Likes 2Bookmarks 0
Abhishek Gupta@abhishekunique7

Empirically, DARP is dead simple to implement, just a little nearest neighbors + the simple aggregation architecture - no objective change, no data change. Since DARP is easy to scale, you can drop it in for your favorite policy class! Nicely integrates with rich distribution classes like diffusion and works directly from visual inputs! You basically get pretty huge gains across the board with very few changes required. (6/7)

Abhishek Gupta@abhishekunique7

Crucially, DARP remains entirely within the standard behavior cloning regime: no simulators, interactive experts, or online training are required.

DARP sees substantial improvement across architectures and input modalities, and even significantly improves upon diffusion policies in real-world robotic manipulation tasks. (5/7)

3hViews 304Likes 1Bookmarks 0
Abhishek Gupta@abhishekunique7

This is a particularly exciting project because it is led by our amazing undergraduate @quinncomputer . Quinn was a tour-de-force on this project, he pulled it together with very little help from us! We basically stumbled on this pretty surprising result and then spent a bunch of time trying to figure out why it worked. That resulted in some pretty cool theory, worked out by the excellent @siddhss5. Take a look at the website and the paper for more details - and use it in your work and tell us how it does!

This was work published at #ICLR2026, with @khimya, Ethan Pronovost, Paarth Shah, @siddhss5.

Website: https://weirdlabuw.github.io/darp-site/ Paper: https://arxiv.org/abs/2606.09758 Colab: https://colab.research.google.com/drive/1N0kBjaT773HkzESaXw884wmsmEpZJjEy

@quinncomputer is applying for PhD programs this year, don't miss the chance to recruit him! :)

Abhishek Gupta@abhishekunique7

Empirically, DARP is dead simple to implement, just a little nearest neighbors + the simple aggregation architecture - no objective change, no data change. Since DARP is easy to scale, you can drop it in for your favorite policy class! Nicely integrates with rich distribution classes like diffusion and works directly from visual inputs! You basically get pretty huge gains across the board with very few changes required. (6/7)

3hViews 278Likes 4Bookmarks 0
Abhishek Gupta@abhishekunique7

Crucially, DARP remains entirely within the standard behavior cloning regime: no simulators, interactive experts, or online training are required.

DARP sees substantial improvement across architectures and input modalities, and even significantly improves upon diffusion policies in real-world robotic manipulation tasks. (5/7)

Abhishek Gupta@abhishekunique7

Empirically, this difference-based reparameterization directly mitigates covariate shift.

By shifting the frame of reference to local distance vectors, query states that are globally out-of-distribution (OOD) remain locally in-distribution. This local consistency grants the policy remarkable robustness during closed-loop rollouts. (4/7)

3hViews 141Likes 0Bookmarks 0
Max For AI@MaxForAI

@abhishekunique7 非常惊人的结果

2hViews 16