RLHF Course Q&A Covers Derivation Fixes and Notation Traps · Digg

/Tech2h ago

RLHF Course Q&A Covers Derivation Fixes and Notation Traps

3436308K

Original post

Nathan Lambert@natolambert#80inTech

I'm doing Q&A videos as I roll through my course. Here's the next one, covering subtle fixes to the on-policy distillation and reward model derivations, common notation traps when doing this math, and more added resources to go deeper (e.g. @johnschulman2's kl estimation blog).

Q&A 2 is here!

00:00 Derivation fixes 06:10 Code examples & additional resources 08:08 Extra RL notation and notes

Keep sending questions on YouTube, GitHub, and Discord. Phoebe and I are loving them.

1:24 PM · Jul 1, 2026 · 3.9K Views

Sentiment

Users praise the RLHF Course Q&A for its attractive thumbnail and long-format content in the style of Andrej.

Pos

100.0%

Neg

0.0%

2 comments with sentiment.

Cluster Engagement

Digg Deeper

No Digg Deeper questions have been answered for this story yet.

Related links

Q&A 2: Mastering the Derivations, Running Algorithms at Home & Notation Gotcha's | RLHF Course

YOUTUBEVia

Posts from X

Most Activity

VIEWS3KBOOKMARKS4REPLIES2

Nathan Lambert@natolambert

https://youtu.be/gB-bYUECpzE

Nathan Lambert@natolambert

I'm doing Q&A videos as I roll through my course. Here's the next one, covering subtle fixes to the on-policy distillation and reward model derivations, common notation traps when doing this math, and more added resources to go deeper (e.g. @johnschulman2's kl estimation blog).

Q&A 2 is here!

00:00 Derivation fixes 06:10 Code examples & additional resources 08:08 Extra RL notation and notes

Keep sending questions on YouTube, GitHub, and Discord. Phoebe and I are loving them.

2h3K24

LIKES4RETWEETS1

Nathan Lambert@natolambert

How can you say no to this thumbnail.

Nathan Lambert@natolambert

https://youtu.be/gB-bYUECpzE

2h1.1K43

khazzan Yassine@KhazzanYassine

@natolambert Long format ( andrej style 🙏)

2h9