PPO: rejected from NIPS 2017
PPO creator John Schulman says the foundational reinforcement learning paper was originally rejected from NIPS 2017
Story Overview
John Schulman just noted that the Proximal Policy Optimization paper, now a cornerstone of reinforcement learning from human feedback, failed to clear the bar for NIPS 2017 even though it later anchored the training recipe behind ChatGPT-style models.
Why top venues sometimes overlook future classics
Absent the original reviewer comments, which Schulman says are buried in an old inbox, the precise objections remain unknown, so the rejection stands as a reminder that early signals can diverge from lasting influence.
How PPO quietly shaped modern AI training
The method simplified earlier trust-region approaches while delivering strong results on robotics and game benchmarks, eventually becoming the default policy optimizer inside RLHF stacks that align today's large models.
Many users praised John Schulman as a legend for PPO despite its NIPS 2017 rejection while others called the decision obviously incorrect and peer review flawed.
No Digg Deeper questions have been answered for this story yet.
Most Activity

@johnschulman2 I know people would be interested to see the reviews if you still have them!

@johnschulman2 I think https://arxiv.org/abs/1503.02531 was rejected as well so its in good company

@thegautamkamath I looked, but unfortunately it's in my openai gmail. maybe @prafdhar has access?
@johnschulman2 legend
PPO: rejected from NIPS 2017

@johnschulman2 Wow quite surprise for that @@ I think the field changed a lot recently but at that time why PPO could be rejected…

@johnschulman2 That's where my first conference paper got accepted! I just slammed together two SOTA language modeling methods and got +2%. Peer review in AI was a farce long before LLMs

@johnschulman2 @canondetortugas Happens to the best !

@johnschulman2 oh boi, doesnt that tell you all you need to know. What reasons did the reviewers state?

@johnschulman2 @JoshPurtell offical and obviously incorrect rejection is something that stays with you, many highly successful people remember these years after the fact, despite being of no additional consequences to their own lives

@johnschulman2

@johnschulman2 PipelineRL: rejected from NeurIPS 2025

@johnschulman2 I’d kill to see the Openreview (or equivalent) for PPO that got rejected.

@DhruvSrikanth @johnschulman2 That’s nuts (if true)

@johnschulman2 Thanks for PPO legend !

@johnschulman2 with what reasoning?