New Paper: Human-like Autonomy Emerges from Self-Play and a Pinch of Human Data.
We trained self-play RL on 60 years of simulation on 1 GPU in ~15 hours. Regularizing with 30 minutes of demonstration data produces much more human-like driving policies!







