/AI9h ago

Gradient Descent Beats RL for Post-Training Diffusion Models

11171112011.7K
Liang Zheng@LiangZheng_06

Diffusion is differentiable. LLMs aren't.

So why is the diffusion community copying RL methods (GRPO etc.) from LLMs?

The native post-training for diffusion is gradient descent such as ReFL and LeapAlign. Paper: http://arxiv.org/abs/2604.15311

6:52 PM · Jun 8, 2026 · 11.7K Views
Sentiment
Sentiment building, check back later.
Cluster Engagement
Posts from X
Most Activity
Most Activity
No ranked X posts are available for this story yet.