Trained with PufferLib + PufferDrive! Really awesome method as well. Considering integrating into our native trainer in a future update.
New Paper: Human-like Autonomy Emerges from Self-Play and a Pinch of Human Data.
We trained self-play RL on 60 years of simulation on 1 GPU in ~15 hours. Regularizing with 30 minutes of demonstration data produces much more human-like driving policies!


