/Tech9h ago

Nathan Lambert releases an educational book and course on reinforcement learning from human feedback

The project includes companion code repositories and lectures.

205444632429.5K

#80

Original post

Nathan Lambert@natolambert

The goal with my rlhf book is to make the "home on the internet" for the next generation learning post-training. That's why I'm doing all formats (lectures, code, book, discord, model completions... & ofc blog of interconnects).

A hub is more lasting than non-fiction writing.

7:43 AM · Jun 25, 2026 · 25.8K Views

Sentiment

Many users thank Nathan Lambert for launching his RLHF book, praising it as an accessible resource that clarifies post-training concepts and supports further learning.

Pos

100.0%

Neg

0.0%

7 comments with sentiment.

Cluster Engagement

Digg Deeper

No Digg Deeper questions have been answered for this story yet.

Posts from X

Most Activity

VIEWS3.9KBOOKMARKS17LIKES56RETWEETS4REPLIES3

Nathan Lambert@natolambert

I get feedback a lot that is like "your book should be the RL for LLMs book" or "the post-training book" and it's definitely true those would sell more copies.

The reality is that this book was in many ways a side project, and by the time I realized I agreed with a bit of this I didn't have the time for *another* refactor.

At the end of the day, I still dumped as much knowledge as I could from what I was doing into the book, and now the course and the code. In it's spirit the book is totally a post-training book.

The process to change this would've delayed the book from anywhere from 3 to 15 months. It is simply an amount of time I didn't have with Interconnects, Olmo, and other life necessities.

So this isn't to say that I'll never do it. Re-prints and new versions are a common thing. It's doable for me to refactor most of the chapters, re-write the introduction, and make it a post-training centric book.

Still, RLHF as a topic deserves a dedicated text and is far from solved. It's a technology that skyrocketed language models to prominence and points to a lot of fundamental problems interfacing the user and the AI.

Much of the content that got me to where I am today in my career is by diving into caring about this interface, so I'm happy for it to have the space to live, breath and thrive.

So in reality, I probably could've hot-swapped the title to sell more copies, but it would have made me feel dishonest to do so. For anyone wanting to learn post-training, there's nothing in this book that doesn't apply to you -- post-training is just constantly evolving and growing in complexity.

A final nitpick, is that RLHF actually matches my more conceptual, intuitive vibe a good amount. Post-training is far more practical, in a data and systems sense, where this is more of a math & intuition book.

Anyways, the RLHF "post-training" Book is coming soon and thank you for trusting me with your attention. 🩵

1h3.9K5617

Nathan Lambert@natolambert

https://rlhfbook.com/

1d2.7K2116

Yuri Kushch@YuriKushch

@natolambert What prerequisite would you recommend if you’re new to the subject?

1d29431

Shashank Shekhar@sha_shekhar_

@natolambert Thank you Nathan. I really enjoyed reading couple months back.

1d231

Darin@darin_gordon

@natolambert Have you authored a decision tree for deciding effective usage?

1d135

Alberto Fuentes (e/acc)@AlberFuen

@natolambert I truly learnt a lot, many RL concepts that were misunderstood and an amazing point from which to go beyond in the sota. Thanks a lot for making It interesting, to the point and challenging and easy to follow equally.

1d381

Craig Certo@craig_certo

@natolambert Appreciate you sharing the knowledge

1d119

evil rank@rankdim

@natolambert should ve been rl for llms but great work appreciate

1d113

Gerhardt@datgerhardt

@natolambert Thank you 🙏

1d110

Eren | AI x Markets@ErenSignals

@natolambert You can feel the replies writing themselves.

1d96

Adel Bucetta@adelbucetta

@natolambert the reason most people think books are dead is that they're just containers for ideas, not the ideas themselves. your approach turns the container into a living ecosystem

1d77

big goose@Anonyous_FPS

@natolambert But now, knowledge in the AI field is evolving rapidly. How is this update?

1d32

Yingzhe@Yingzhe0301

@natolambert Will you be writing reinforcement learning from verifiable reward 😌

1d29

Gregor@bygregorr

@natolambert not sure the 'hub outlasts the book' assumption holds. code notebooks break on dependency updates, discord goes quiet without active moderation. seen a few ml course hubs go dark within a year. is the written text actually the part that survives when curation energy runs out?

1d27

Byron Gee@Gee_Luyj

@natolambert thanks.

21h11

Ber@BerCam__

@natolambert Hi Nathan! I'm building http://Zyndo.ai a marketplace for AI agents, and I think it could be something interesting for you, happy to show you the app and let you test it for free. Best!

1d2