2d ago

Nando de Freitas releases reinforcement learning tutorial with notebook

010070

——0——

Nando de Freitas released a reinforcement learning tutorial focused on policy gradients. The release includes a Python notebook and corresponding TeX source files hosted at love4all.ai. Development used coding assistance from OpenAI GPT, Codex, and AnthropicAI Claude. Tao Xu replied that the notebook places KL divergence directly inside advantage calculations and normalization, which can couple the KL term with added variance alongside standard policy-gradient or PPO methods.

Original post

Nando de Freitas#29@NANDODF

This is a tutorial on reinforcement learning based on previous posts here. I'm including a policy gradient python notebook and the tex source so it can be translated to other languages to spread knowledge. https://love4all.ai/ @OpenAI GPT & Codex and @AnthropicAI Claude Code helped me. Both were great. So that people can find these, I am now placing all materials on my first blog website ❤️4∀.ai

4:05 AM · May 14, 2026

Cluster engagement

17 snapshots

ORIGINAL POST

#29Nando de Freitas@NANDODF

love4all.ai

@OpenAI GPT & Codex and @AnthropicAI Claude Code helped me. Both were great.

So that people can find these, I am now placing all materials on my first blog website ❤️4∀.ai

11:05 AM · May 14, 2026 · 10.4K Views

#29Nando de Freitas@NANDODF

The tutorial covers spicy topics like is reward enough? but first it provides the foundations: policy gradients, PPO, GrPO, probabilistic version via expectation maximisation (EM), RL for pretraining via e.g. online EM, imitation via Daggr, self-improvement, and tool-use with GLM

Nando de Freitas@NandoDF

11:05 AM · May 14, 2026 · 10.4K Views

11:15 AM · May 14, 2026 · 1.9K Views

#29Nando de Freitas@NANDODF

@txhf That puzzled me too. I left it because the results were ok, but let me ablate against the usual weighted combination. I’ll let you know. Thanks Tao 🙏

Tao Xu@txhf

@NandoDF the notebook put KL into advantage calculation and normalization, it couples KL with extra variances?

9:49 PM · May 14, 2026 · 177 Views

5:09 AM · May 15, 2026 · 41 Views

#29Nando de Freitas@NANDODF

@txhf You're absolutely right @txhf - That was a bug. It didn't show in the results because the example is rather easy. I increased the complexity a tiny bit and fixed it. Thanks 🙏

Actually for people following this, I would recommend you also try the standard KL estimator.

Tao Xu@txhf

@NandoDF the notebook put KL into advantage calculation and normalization, it couples KL with extra variances?

9:49 PM · May 14, 2026 · 177 Views

10:38 AM · May 15, 2026 · 70 Views

#751Tao Xu@TXHF

@NandoDF the notebook put KL into advantage calculation and normalization, it couples KL with extra variances?

Nando de Freitas@NandoDF

11:15 AM · May 14, 2026 · 1.9K Views

9:49 PM · May 14, 2026 · 177 Views