PPO: rejected from NIPS 2017
PPO creator John Schulman says the foundational reinforcement learning paper was originally rejected from NIPS 2017
Story Overview
John Schulman just noted that the Proximal Policy Optimization paper, now a cornerstone of reinforcement learning from human feedback, failed to clear the bar for NIPS 2017 even though it later anchored the training recipe behind ChatGPT-style models.
Why top venues sometimes overlook future classics
Absent the original reviewer comments, which Schulman says are buried in an old inbox, the precise objections remain unknown, so the rejection stands as a reminder that early signals can diverge from lasting influence.
How PPO quietly shaped modern AI training
The method simplified earlier trust-region approaches while delivering strong results on robotics and game benchmarks, eventually becoming the default policy optimizer inside RLHF stacks that align today's large models.
Positive users offered support to John Schulman after the PPO paper rejection from NIPS 2017, while negative users criticized the peer review process as flawed and farcical.
No Digg Deeper questions have been answered for this story yet.
Most Activity
@johnschulman2 legend
PPO: rejected from NIPS 2017

@johnschulman2 I think https://arxiv.org/abs/1503.02531 was rejected as well so its in good company

@johnschulman2 I know people would be interested to see the reviews if you still have them!

@hallerite iirc, the usual things: limited novelty, insufficient improvement over baselines

@thegautamkamath I looked, but unfortunately it's in my openai gmail. maybe @prafdhar has access?

@johnschulman2 @shizhediao “By 2026 it will be clear that the impact of PPO on RL has been no greater than careful hyperparameter tuning” - The reviewer

@johnschulman2 Wow quite surprise for that @@ I think the field changed a lot recently but at that time why PPO could be rejected…

@johnschulman2 That's where my first conference paper got accepted! I just slammed together two SOTA language modeling methods and got +2%. Peer review in AI was a farce long before LLMs

@johnschulman2 @canondetortugas Happens to the best !

@johnschulman2 oh boi, doesnt that tell you all you need to know. What reasons did the reviewers state?

@johnschulman2 @JoshPurtell offical and obviously incorrect rejection is something that stays with you, many highly successful people remember these years after the fact, despite being of no additional consequences to their own lives

@johnschulman2

@johnschulman2 PipelineRL: rejected from NeurIPS 2025

@johnschulman2 there just seems to be an undeniable mismatch between academia, which cares about novelty and improving on baselines in a small controlled setting, and "the real world", which cares about the methods that are scalable and don't become unstable at scale

@johnschulman2 I’d kill to see the Openreview (or equivalent) for PPO that got rejected.

@DhruvSrikanth @johnschulman2 That’s nuts (if true)

@johnschulman2 @amit05prakash I heard they gave you hell over TRPO as well… from one of the reviewers! Jokes on them I guess

@johnschulman2 Hehe. Similar to DDPG, PPO gets the test of time award at NeurIPS 2027:)

@johnschulman2 Thanks for PPO legend !

@johnschulman2 Once upon a time in 2008, the leading method on the Netflix prize leaderboard was NIPS_REJECT, the reviewers were worried it wouldn't scale.