/Tech11h ago

Will Brown of Prime Intellect argues PPO and GRPO lack consensus definitions, while Joseph Suarez points to PPO pseudocode discrepancies

Original PPO pseudocode incorrectly suggests epoch-based gradient accumulation

2129884621.1K

#573

Original post

will brown@willccbb#573inTech

the most important thing to keep in mind about the PPO vs GRPO debate is that nobody agrees on what either algorithm even is

1:47 PM · Jun 17, 2026 · 18.3K Views

Sentiment

Users dismissed the PPO versus GRPO debate as taxonomy theater and a psyop, mocking it with terms like "ai shmai" because the underlying algorithms lack clear definitions.

Pos

0.0%

Neg

100.0%

2 comments with sentiment.

Cluster Engagement

Digg Deeper

No Digg Deeper questions have been answered for this story yet.

Posts from X

Most Activity

VIEWS1.8KBOOKMARKS6LIKES39RETWEETS2REPLIES1

will brown@willccbb

one could reasonably claim that GRPO has little to do with groups and is mostly about advantage normalization, and that PPO has little to do with critics and is mostly about ratio clipping, but then "PPO vs GRPO" doesn't even make sense as a question

will brown@willccbb

the most important thing to keep in mind about the PPO vs GRPO debate is that nobody agrees on what either algorithm even is

11h1.8K396

Joseph Suarez 🐡@jsuarez

@willccbb Lack of consensus is not lack of certainty. The only ambiguity I have seen in PPO is that the pseudocode in the paper suggests that gradients are accumulated per update epoch, whereas in practice (including in the reference code) they are not.

will brown@willccbb

the most important thing to keep in mind about the PPO vs GRPO debate is that nobody agrees on what either algorithm even is

10h1.1K224

Ariel@redtachyon

@willccbb I mean in the current discourse it's just a proxy for critic-based vs group-relative advantage in policy gradients respectively

11h3079

Bryan Cheong@bryancsk

@willccbb Yesterday over dinner the sentence "GRPO with group size 1" was said and it took like ten minutes to register that's probably just Reinforce

10h2821

Chris 🇨🇦@llm_wizard

@willccbb Or even what and when they're comparing them.

10h1491

Pratt@PrathmeshBhat19

@willccbb @charles_irl Maybe it is the friends we made along the way

10h571

Vamsi Aribandi@varibandi

@willccbb Is it not clear that this debate is group based advantage estimation vs critic based? Anyone that’s reading more into it is either missing the forest for the trees or prioritizing the wrong experiments wrt algo exploration

10h119