8h agoOmar Khattab critiques traditional on-policy distillation, while Rishabh Agarwal points to Speculative OPD to fix student feedback driftIt rejects flawed student tokens, resampling directly from the teacher.SentimentSentimentPos90.5%Neg9.5%Many users praised the proposed Pedagogical RL approach for its potential in personalized instruction and sample efficiency while a few criticized On-Policy Distillation as a lazy ineffective alternative.14 comments with sentiment. View comments.