TrOPD Introduces Trust Region Method For Stable On-Policy Distillation · Digg