the most important thing to keep in mind about the PPO vs GRPO debate is that nobody agrees on what either algorithm even is
Will Brown of Prime Intellect argues PPO and GRPO lack consensus definitions, while Joseph Suarez points to PPO pseudocode discrepancies
Original PPO pseudocode incorrectly suggests epoch-based gradient accumulation
Users dismissed the PPO versus GRPO debate as taxonomy theater and a psyop, mocking it with terms like "ai shmai" because the underlying algorithms lack clear definitions.
No Digg Deeper questions have been answered for this story yet.
Most Activity
one could reasonably claim that GRPO has little to do with groups and is mostly about advantage normalization, and that PPO has little to do with critics and is mostly about ratio clipping, but then "PPO vs GRPO" doesn't even make sense as a question
the most important thing to keep in mind about the PPO vs GRPO debate is that nobody agrees on what either algorithm even is
@willccbb Lack of consensus is not lack of certainty. The only ambiguity I have seen in PPO is that the pseudocode in the paper suggests that gradients are accumulated per update epoch, whereas in practice (including in the reference code) they are not.
the most important thing to keep in mind about the PPO vs GRPO debate is that nobody agrees on what either algorithm even is

@willccbb I mean in the current discourse it's just a proxy for critic-based vs group-relative advantage in policy gradients respectively

@willccbb Yesterday over dinner the sentence "GRPO with group size 1" was said and it took like ten minutes to register that's probably just Reinforce

@willccbb Or even what and when they're comparing them.

@willccbb @charles_irl Maybe it is the friends we made along the way

@willccbb Is it not clear that this debate is group based advantage estimation vs critic based? Anyone that’s reading more into it is either missing the forest for the trees or prioritizing the wrong experiments wrt algo exploration

@willccbb Which one got rejected from the neurips(?) again?

@willccbb How can PPO be better if GRPO *is* PPO 🤔🤔

@willccbb @ariaurelium what is agent

@willccbb ai shmai

@willccbb You know how GRPO vs PPO debates are basically a cat psyop? Strip the labels: it's clipping versus normalized baselines. Taxonomy theater.