13h ago

On-Policy Distillation Joins PapersWithCode With 183 Citing Papers

157016971141.6K

——0——

Original post

One of the hottest terms in AI right now is "On-policy distillation". It is a post-training technique in which a student model, typically an LLM, samples from its current policy and receives a teacher signal for on-policy states. It combines the dense supervision of distillation with the locality of online RL. Now a method on PapersWithCode! Find all 183 papers that cite it, and more here: https://paperswithcode.co/methods/on-policy-distillation

7:25 AM · May 25, 2026

On-Policy Distillation Joins PapersWithCode With 183 Citing Papers

Sentiment

Cluster engagement