AI2 post-training lead Nathan Lambert releases a free lecture on the evolution of synthetic data and on-policy distillation

VIEWS3.6KBOOKMARKS26LIKES50

Something I should add -- on-policy distillation was the last content I got to sneak into the book before going to print.

Felt very important to have this method covered, it's growing rapidly and used in distinct ways.

So you can also read what is covered in this lecture!

Nathan Lambert@natolambert

New lecture for the book! Nominally about synthetic data, but mostly is a walk through of the distillation literature from the Hinton 2015 paper to multi-teach on-policy distillation of today!

At 7.4 hours of video in my post-training brain dump and counting :)

It was fun to stare at the math long enough and talk through the 3-4 core changes that needed to be made to the original formulation to have on-policy distillation be ready for the mainstream like it is today (and in RL frameworks).

Otherwise, I include a bit of a history lesson for how synthetic data generally slowly took over all post-training data research (it wasn't always the case)! Then I do some 101 review on constitutional AI, rubrics, and other popular methods.

00:00 The emergence of synthetic data 10:50 Background on teacher-student knowledge-distillation 24:47: On-policy distillation (OPD, MOPD, and OPSD) 37:11 Constitutional AI & AI Feedback 45:50 Rubrics as rewards & conclusions

Ofc, watch on YouTube etc.

1h3.6K5026

RETWEETS1

Nathan Lambert@natolambert

@xeophon Real friends say no to maven / masterclass et al

Florian Brand@xeophon

me n who

52m563111

REPLIES5

Florian Brand@xeophon

me n who

Nathan Lambert@natolambert

New lecture for the book! Nominally about synthetic data, but mostly is a walk through of the distillation literature from the Hinton 2015 paper to multi-teach on-policy distillation of today!

At 7.4 hours of video in my post-training brain dump and counting :)

It was fun to stare at the math long enough and talk through the 3-4 core changes that needed to be made to the original formulation to have on-policy distillation be ready for the mainstream like it is today (and in RL frameworks).

Otherwise, I include a bit of a history lesson for how synthetic data generally slowly took over all post-training data research (it wasn't always the case)! Then I do some 101 review on constitutional AI, rubrics, and other popular methods.

00:00 The emergence of synthetic data 10:50 Background on teacher-student knowledge-distillation 24:47: On-policy distillation (OPD, MOPD, and OPSD) 37:11 Constitutional AI & AI Feedback 45:50 Rubrics as rewards & conclusions

Ofc, watch on YouTube etc.

53m2.1K307

Nathan Lambert@natolambert

YT: https://www.youtube.com/watch?v=6nyJ8y8ghsE&list=PLL1tdVxB1CpVpEtMHxwuR4uI4Lxjw00_y&index=11&t=1s

Nathan Lambert@natolambert

New lecture for the book! Nominally about synthetic data, but mostly is a walk through of the distillation literature from the Hinton 2015 paper to multi-teach on-policy distillation of today!

At 7.4 hours of video in my post-training brain dump and counting :)

It was fun to stare at the math long enough and talk through the 3-4 core changes that needed to be made to the original formulation to have on-policy distillation be ready for the mainstream like it is today (and in RL frameworks).

Otherwise, I include a bit of a history lesson for how synthetic data generally slowly took over all post-training data research (it wasn't always the case)! Then I do some 101 review on constitutional AI, rubrics, and other popular methods.

00:00 The emergence of synthetic data 10:50 Background on teacher-student knowledge-distillation 24:47: On-policy distillation (OPD, MOPD, and OPSD) 37:11 Constitutional AI & AI Feedback 45:50 Rubrics as rewards & conclusions

Ofc, watch on YouTube etc.

1h2.7K187

Nathan Lambert@natolambert

https://rlhfbook.com/c/12-synthetic-data#the-path-to-on-policy-teacher-student-distillation

Nathan Lambert@natolambert

Something I should add -- on-policy distillation was the last content I got to sneak into the book before going to print.

Felt very important to have this method covered, it's growing rapidly and used in distinct ways.

So you can also read what is covered in this lecture!

1h1K34

Nathan Lambert@natolambert

Again, the homepage is here: https://rlhfbook.com/course

Nathan Lambert@natolambert

YT: https://www.youtube.com/watch?v=6nyJ8y8ghsE&list=PLL1tdVxB1CpVpEtMHxwuR4uI4Lxjw00_y&index=11&t=1s

1h1.3K132

Pradyumna (in Bay Area)@PradyuPrasad

@xeophon me and @lavanya_g112

46m281

forget the grind. small iterative steps. do things@halftroll

i have known about this series for weeks but haven't had time to jump in. learning today that part of what you're dealing with is the elevated way in which synthetic data can be used is quite frankly really exciting so i will find time for the whole series. thanks a lot for all this.

1h181

Ben Schulz@schulzb589

@xeophon The guy is a national treasure.

44m161

Harald Schäfer@___Harald___

@natolambert This is how openpilot driving models are trained. On-policy in diffusion sim, with a teacher who knows the future showing the student the correct actions.

4m20