New lecture for the book! Nominally about synthetic data, but mostly is a walk through of the distillation literature from the Hinton 2015 paper to multi-teach on-policy distillation of today!
At 7.4 hours of video in my post-training brain dump and counting :)
It was fun to stare at the math long enough and talk through the 3-4 core changes that needed to be made to the original formulation to have on-policy distillation be ready for the mainstream like it is today (and in RL frameworks).
Otherwise, I include a bit of a history lesson for how synthetic data generally slowly took over all post-training data research (it wasn't always the case)! Then I do some 101 review on constitutional AI, rubrics, and other popular methods.
00:00 The emergence of synthetic data 10:50 Background on teacher-student knowledge-distillation 24:47: On-policy distillation (OPD, MOPD, and OPSD) 37:11 Constitutional AI & AI Feedback 45:50 Rubrics as rewards & conclusions
Ofc, watch on YouTube etc.







