/AI9h ago

Gradient Descent Beats RL for Post-Training Diffusion Models

11171112011.7K

Original post

Luca Ambrogioni#1824

Liang Zheng@LiangZheng_06

Diffusion is differentiable. LLMs aren't.

So why is the diffusion community copying RL methods (GRPO etc.) from LLMs?

The native post-training for diffusion is gradient descent such as ReFL and LeapAlign. Paper: http://arxiv.org/abs/2604.15311

6:52 PM · Jun 8, 2026 · 11.7K Views

Sentiment

Sentiment building, check back later.

Cluster Engagement

Posts from X

Most Activity

No ranked X posts are available for this story yet.