10h agoSelf-Distilled Policy Gradient Unifies RL and Distillation at Token LevelSentimentSentimentPos100%Neg0%Users find the Self-Distilled Policy Gradient method fascinating because it unifies RL and distillation via token rewards.1 comment with sentiment. View comments.