Teortaxes argues top-k distillation matches policy-gradient performance with less compute but collapses when using different-origin teacher models · Digg