8h ago

Omar Khattab critiques traditional on-policy distillation, while Rishabh Agarwal points to Speculative OPD to fix student feedback drift

It rejects flawed student tokens, resampling directly from the teacher.

Sentiment

Pos90.5%

Neg9.5%

Many users praised the proposed Pedagogical RL approach for its potential in personalized instruction and sample efficiency while a few criticized On-Policy Distillation as a lazy ineffective alternative.

14 comments with sentiment.

Omar Khattab critiques traditional on-policy distillation, while Rishabh Agarwal points to Speculative OPD to fix student feedback drift · Digg