McGill's Li Jiang analyzes on-policy distillation, finding it exhibits unique geometric training dynamics rather than mirroring SFT or RLVR · Digg