PPO creator John Schulman says the foundational reinforcement learning paper was originally rejected from NIPS 2017 · Digg