/Tech2h ago

PPO creator John Schulman says the foundational reinforcement learning paper was originally rejected from NIPS 2017

Story Overview

John Schulman just noted that the Proximal Policy Optimization paper, now a cornerstone of reinforcement learning from human feedback, failed to clear the bar for NIPS 2017 even though it later anchored the training recipe behind ChatGPT-style models.

23896717530.6K

#37

Original post

John Schulman@johnschulman2#37inTech

PPO: rejected from NIPS 2017

8:11 AM · Jun 17, 2026 · 32.4K Views

Open Question

Why top venues sometimes overlook future classics

Absent the original reviewer comments, which Schulman says are buried in an old inbox, the precise objections remain unknown, so the rejection stands as a reminder that early signals can diverge from lasting influence.

Developer Impact

How PPO quietly shaped modern AI training

The method simplified earlier trust-region approaches while delivering strong results on robotics and game benchmarks, eventually becoming the default policy optimizer inside RLHF stacks that align today's large models.

Sentiment

Many users praised John Schulman as a legend for PPO despite its NIPS 2017 rejection while others called the decision obviously incorrect and peer review flawed.

Pos

60.0%

Neg

40.0%

5 comments with sentiment.

Cluster Engagement

Digg Deeper

No Digg Deeper questions have been answered for this story yet.

Posts from X

Most Activity

VIEWS1.1KLIKES13REPLIES1

Gautam Kamath@thegautamkamath

@johnschulman2 I know people would be interested to see the reviews if you still have them!

2h1.1K13

BOOKMARKS3

Dhruv Srikanth@DhruvSrikanth

@johnschulman2 I think https://arxiv.org/abs/1503.02531 was rejected as well so its in good company

2h88963

John Schulman@johnschulman2

@thegautamkamath I looked, but unfortunately it's in my openai gmail. maybe @prafdhar has access?

2h1K101

Xiao Ma@infoxiao

@johnschulman2 legend

John Schulman@johnschulman2

PPO: rejected from NIPS 2017

1h1.1K50

Duc Manh Bui@Buimanh22

@johnschulman2 Wow quite surprise for that @@ I think the field changed a lot recently but at that time why PPO could be rejected…

2h6392

Joseph Suarez 🐡@jsuarez

@johnschulman2 That's where my first conference paper got accepted! I just slammed together two SOTA language modeling methods and got +2%. Peer review in AI was a farce long before LLMs

1h1433

Pierre Richemond 🇪🇺@TheOneKloud

@johnschulman2 @canondetortugas Happens to the best !

1h4161

Nasib A. Naimi@NasibNaimi

@johnschulman2 oh boi, doesnt that tell you all you need to know. What reasons did the reviewers state?

1h3871

Mahaoo@mahaoo_ASI

@johnschulman2 @JoshPurtell offical and obviously incorrect rejection is something that stays with you, many highly successful people remember these years after the fact, despite being of no additional consequences to their own lives

1h2841