2d ago

Nando de Freitas releases reinforcement learning tutorial with notebook

0

Nando de Freitas released a reinforcement learning tutorial focused on policy gradients. The release includes a Python notebook and corresponding TeX source files hosted at love4all.ai. Development used coding assistance from OpenAI GPT, Codex, and AnthropicAI Claude. Tao Xu replied that the notebook places KL divergence directly inside advantage calculations and normalization, which can couple the KL term with added variance alongside standard policy-gradient or PPO methods.

Original post

This is a tutorial on reinforcement learning based on previous posts here. I'm including a policy gradient python notebook and the tex source so it can be translated to other languages to spread knowledge. https://love4all.ai/ @OpenAI GPT & Codex and @AnthropicAI Claude Code helped me. Both were great. So that people can find these, I am now placing all materials on my first blog website ❤️4∀.ai

4:05 AM · May 14, 2026 View on X

This is a tutorial on reinforcement learning based on previous posts here. I'm including a policy gradient python notebook and the tex source so it can be translated to other languages to spread knowledge.

love4all.ai
/

@OpenAI GPT & Codex and @AnthropicAI Claude Code helped me. Both were great.

So that people can find these, I am now placing all materials on my first blog website ❤️4∀.ai

11:05 AM · May 14, 2026 · 10.4K Views

The tutorial covers spicy topics like is reward enough? but first it provides the foundations: policy gradients, PPO, GrPO, probabilistic version via expectation maximisation (EM), RL for pretraining via e.g. online EM, imitation via Daggr, self-improvement, and tool-use with GLM

Nando de FreitasNando de Freitas@NandoDF

This is a tutorial on reinforcement learning based on previous posts here. I'm including a policy gradient python notebook and the tex source so it can be translated to other languages to spread knowledge. https://love4all.ai/ @OpenAI GPT & Codex and @AnthropicAI Claude Code helped me. Both were great. So that people can find these, I am now placing all materials on my first blog website ❤️4∀.ai

11:05 AM · May 14, 2026 · 10.4K Views
11:15 AM · May 14, 2026 · 1.9K Views

@txhf That puzzled me too. I left it because the results were ok, but let me ablate against the usual weighted combination. I’ll let you know. Thanks Tao 🙏

Tao XuTao Xu@txhf

@NandoDF the notebook put KL into advantage calculation and normalization, it couples KL with extra variances?

9:49 PM · May 14, 2026 · 177 Views
5:09 AM · May 15, 2026 · 41 Views

@txhf You're absolutely right @txhf - That was a bug. It didn't show in the results because the example is rather easy. I increased the complexity a tiny bit and fixed it. Thanks 🙏

Actually for people following this, I would recommend you also try the standard KL estimator.

Tao XuTao Xu@txhf

@NandoDF the notebook put KL into advantage calculation and normalization, it couples KL with extra variances?

9:49 PM · May 14, 2026 · 177 Views
10:38 AM · May 15, 2026 · 70 Views

@NandoDF the notebook put KL into advantage calculation and normalization, it couples KL with extra variances?

Nando de FreitasNando de Freitas@NandoDF

The tutorial covers spicy topics like is reward enough? but first it provides the foundations: policy gradients, PPO, GrPO, probabilistic version via expectation maximisation (EM), RL for pretraining via e.g. online EM, imitation via Daggr, self-improvement, and tool-use with GLM

11:15 AM · May 14, 2026 · 1.9K Views
9:49 PM · May 14, 2026 · 177 Views
Nando de Freitas releases reinforcement learning tutorial with notebook · Digg