🤖 How can we teach dexterous robots to perform precise, contact-rich assembly?
Introducing Play2Perfect: first learn to play with objects, then perfect the policy for tight insertion, multi-part assembly, and screwing.
Sound on! 🔊
🧵👇
The system refines unstructured exploration into tight insertion and precise assembly maneuvers.
🤖 How can we teach dexterous robots to perform precise, contact-rich assembly?
Introducing Play2Perfect: first learn to play with objects, then perfect the policy for tight insertion, multi-part assembly, and screwing.
Sound on! 🔊
🧵👇
Users congratulated researchers on Play2Perfect for training robots in precise contact-rich assembly because of its impressive zero-shot sim-to-real transfer and overall effectiveness.
No Digg Deeper questions have been answered for this story yet.
It's so satisfying to watch the natural, dexterous behavior that our Play2Perfect policy produces.
Play2Perfect first learns a base policy with generic skills like grasping, in-hand re-orientation and goal reaching.
Then it finetunes this policy for a specific assembly task.
🤖 How can we teach dexterous robots to perform precise, contact-rich assembly?
Introducing Play2Perfect: first learn to play with objects, then perfect the policy for tight insertion, multi-part assembly, and screwing.
Sound on! 🔊
🧵👇

To support further research in dexterous assembly, we are releasing our code, assets, and training/evaluation environments.
Full details + videos + code: https://play2perfect.github.io
This work wouldn't be possible without @kushalk_ (co-lead), @leto__jean, and Prof. Karen Liu. 🙏

To see why dexterous assembly is hard, here is a rollout slowed down to 0.5× speed.
Success requires chaining together grasping, recovery, alignment, insertion, and screwing. When executing at high speed, the policy must react to inevitable failures and correct on the fly.

Before we can learn the hard problem of precise assembly, we first learn the easier problem of playing with objects in free-space.
Play2Perfect is a framework for 1️⃣Play pretraining on diverse objects and goals 2️⃣Assembly finetuning for contact-rich tasks 3️⃣Zero-shot sim-to-real

Even after an initial failed grasp, the policy keeps acting closed-loop.
It can retry, regrasp, reorient, and continue until it completes the task.
This recovery behavior comes naturally from learning reusable dexterous play priors.
Sound on! 🔊

But... the type of play matters. We systematically study the key design choices during pretraining.
For transfer to precise assembly, we find four ingredients are important:
✅ Diverse objects ✅ Position + orientation control ✅ Goal trajectory diversity ✅ Goal precision

Beyond improving RL sample efficiency by 33×, policies with play pretraining are more robust.
On a toy peg-in-hole task, Play2Perfect learns stable grasps. However, training from scratch alone finds a brittle thumb-balancing strategy that fails under small perturbations.

Play2Perfect can perform tight insertion down to 0.5mm clearances.
Beyond precision, the policy learns to use contact to search for hole alignment, and discovers a tilted insertion strategy similar to robust peg-in-hole strategies studied by Chhatpar & Branicky (IROS 2001).

Play2Perfect learns diverse assembly tasks and adapts to diverse initial conditions.
✅ Tight insertion ✅ Multi-part assembly ✅ Screwing
From different part poses, the policy finds different ways to grasp, reorient, align, and complete the assembly.

@tylerlum23 @kushalk_ @leto__jean Congrats @tylerlum23 @kushalk_!! super cool work

@tylerlum23 @kushalk_ @leto__jean Amazing work!! so nice that the sim to real is zero shot :)

@tylerlum23 This is incredibly good. Nice work!

@tylerlum23 Would be interesting to see this with part variation and tool wear in production.