/Tech3h ago

PPO creator John Schulman says the foundational reinforcement learning paper was originally rejected from NIPS 2017

Story Overview

John Schulman just noted that the Proximal Policy Optimization paper, now a cornerstone of reinforcement learning from human feedback, failed to clear the bar for NIPS 2017 even though it later anchored the training recipe behind ChatGPT-style models.

311.2K9510147.3K

#37

Original post

John Schulman@johnschulman2#37inTech

PPO: rejected from NIPS 2017

8:11 AM · Jun 17, 2026 · 48.1K Views

Open Question

Why top venues sometimes overlook future classics

Absent the original reviewer comments, which Schulman says are buried in an old inbox, the precise objections remain unknown, so the rejection stands as a reminder that early signals can diverge from lasting influence.

Developer Impact

How PPO quietly shaped modern AI training

The method simplified earlier trust-region approaches while delivering strong results on robotics and game benchmarks, eventually becoming the default policy optimizer inside RLHF stacks that align today's large models.

Sentiment

Positive users offered support to John Schulman after the PPO paper rejection from NIPS 2017, while negative users criticized the peer review process as flawed and farcical.

Pos

40.0%

Neg

60.0%

10 comments with sentiment.

Cluster Engagement

Digg Deeper

No Digg Deeper questions have been answered for this story yet.

Posts from X

Most Activity

VIEWS1.4K

Xiao Ma@infoxiao

@johnschulman2 legend

John Schulman@johnschulman2

PPO: rejected from NIPS 2017

2h1.4K70

BOOKMARKS3

Dhruv Srikanth@DhruvSrikanth

@johnschulman2 I think https://arxiv.org/abs/1503.02531 was rejected as well so its in good company

3h88963

LIKES13

Gautam Kamath@thegautamkamath

@johnschulman2 I know people would be interested to see the reviews if you still have them!

3h1.1K13

REPLIES1

John Schulman@johnschulman2

@hallerite iirc, the usual things: limited novelty, insufficient improvement over baselines

46m1153

John Schulman@johnschulman2

@thegautamkamath I looked, but unfortunately it's in my openai gmail. maybe @prafdhar has access?

3h1K101

RSC ☀️🌲@silver__tsuki

@johnschulman2 @shizhediao “By 2026 it will be clear that the impact of PPO on RL has been no greater than careful hyperparameter tuning” - The reviewer

1h21631

Duc Manh Bui@Buimanh22

@johnschulman2 Wow quite surprise for that @@ I think the field changed a lot recently but at that time why PPO could be rejected…

3h6392

Joseph Suarez 🐡@jsuarez

@johnschulman2 That's where my first conference paper got accepted! I just slammed together two SOTA language modeling methods and got +2%. Peer review in AI was a farce long before LLMs

2h1433

Pierre Richemond 🇪🇺@TheOneKloud

@johnschulman2 @canondetortugas Happens to the best !

2h4161

Nasib A. Naimi@NasibNaimi

@johnschulman2 oh boi, doesnt that tell you all you need to know. What reasons did the reviewers state?

2h3871

Mahaoo@mahaoo_ASI

@johnschulman2 @JoshPurtell offical and obviously incorrect rejection is something that stays with you, many highly successful people remember these years after the fact, despite being of no additional consequences to their own lives

2h2841

yobibyte@y0b1byte

@johnschulman2

2h931

Rafael Pardinas@muchomuchacho

@johnschulman2 PipelineRL: rejected from NeurIPS 2025

2h259

hallerite@hallerite

@johnschulman2 there just seems to be an undeniable mismatch between academia, which cares about novelty and improving on baselines in a small controlled setting, and "the real world", which cares about the methods that are scalable and don't become unstable at scale

29m631

National Champions 〽️🇺🇸🏀@JKSully313

@johnschulman2 I’d kill to see the Openreview (or equivalent) for PPO that got rejected.

2h179

Think_Different_@ThinkDi92468945

@DhruvSrikanth @johnschulman2 That’s nuts (if true)

3h541

Chris Nota@chris_nota_rl

@johnschulman2 @amit05prakash I heard they gave you hell over TRPO as well… from one of the reviewers! Jokes on them I guess

17m134

Kishan@kpb_in_acad

@johnschulman2 Hehe. Similar to DDPG, PPO gets the test of time award at NeurIPS 2027:)

49m127

Bojan Jakimovski@Shekswess

@johnschulman2 Thanks for PPO legend !

2h52

tinyForest@tinyForest

@johnschulman2 Once upon a time in 2008, the leading method on the Netflix prize leaderboard was NIPS_REJECT, the reviewers were worried it wouldn't scale.

40m44