Original post
Super Dario#1776
Introducing Self-Distilled Policy Gradient.
Token-level rewards, credit assignment, self-distillation.
RL and distillation are converging toward the same idea:
Policy gradients, it always has been, it always will be.
https://huggingface.co/papers/2606.04036
10:24 PM · Jun 3, 2026 · 15.4K Views