/Tech14h ago

Researchers train human-like autonomous driving agents on a single GPU in 15 hours using PufferLib

The training regularized 60 simulation years with human data

103952817636.6K

#403

Original post

Joseph Suarez 🐡@jsuarez#1687inTech

Trained with PufferLib + PufferDrive! Really awesome method as well. Considering integrating into our native trainer in a future update.

Daphne Cornelisse@daphne_cor

New Paper: Human-like Autonomy Emerges from Self-Play and a Pinch of Human Data.

We trained self-play RL on 60 years of simulation on 1 GPU in ~15 hours. Regularizing with 30 minutes of demonstration data produces much more human-like driving policies!

11:31 AM · Jun 19, 2026 · 14.3K Views

Sentiment

Users are excited by self-play RL producing human-like driving policies with minimal data, praising the approach as superhuman and thanking collaborators for the achievement.

Pos

100.0%

Neg

0.0%

3 comments with sentiment.

Cluster Engagement

Digg Deeper

No Digg Deeper questions have been answered for this story yet.

Posts from X

Most Activity

VIEWS693BOOKMARKS6LIKES13

Daphne Cornelisse@daphne_cor

Project page: https://spiced-self-play.com/ Arxiv: https://arxiv.org/abs/2606.19370

14h693136

RETWEETS15

Daphne Cornelisse@daphne_cor

New Paper: Human-like Autonomy Emerges from Self-Play and a Pinch of Human Data.

We trained self-play RL on 60 years of simulation on 1 GPU in ~15 hours. Regularizing with 30 minutes of demonstration data produces much more human-like driving policies!

14h22.5K18780

REPLIES1

kache@yacineMTB

@daphne_cor Unregularized is a better driver. Deploy it in real life. Superhuman

10h40922

Daphne Cornelisse@daphne_cor

A big thank you to my collaborators for their unique contributions to this work: @julianh651, Zixu Zhang, Waël Doulazmi, @kev_joseph_, Jaime Fernández Fisac, and @EugeneVinitsky!

14h4955

Victor Butoi@ion_barrel

@daphne_cor Cool!!!

13h2591

kache@yacineMTB

@daphne_cor The reason unregularized drives the way it does is because it has god sight. Give it only raytraced pixels by simulating lidar

10h17