Self-Distilled Policy Gradient Unifies RL and Distillation at Token Level · Digg