Original post
Luca Ambrogioni#1824
Liang Zheng@LiangZheng_06
Diffusion is differentiable. LLMs aren't.
So why is the diffusion community copying RL methods (GRPO etc.) from LLMs?
The native post-training for diffusion is gradient descent such as ReFL and LeapAlign. Paper: http://arxiv.org/abs/2604.15311
6:52 PM · Jun 8, 2026 · 11.7K Views