New blog post: On-Policy Distillation — Promise, Pitfalls, and Prospects.
OPD combines on-policy rollouts with dense teacher supervision.
But it is not a free lunch.
I discuss three failure modes and introduce our new paper.
https://louieworth.github.io/blog/opd_reflection/





