Cohere's Kris Cao argues the classic imitation learning algorithm DAgger has been rebranded as "on-policy distillation
Both concepts rely on iteratively aggregating expert feedback data.
——0——
Both concepts rely on iteratively aggregating expert feedback data.
Users agree the researcher's observation correctly links DAgger to on-policy distillation, finding the connection insightful.
1 comment with sentiment.