2d ago

Fast-Slow Training outperforms RL baselines on math code tasks

——0——

Fast-Slow Training interleaves reinforcement learning updates to LLM parameters with GEPA prompt optimization treated as fast weights. Outlined in a blog post on gepa-ai.github.io and discussed by Rishabh Agarwal and Joey Gonzalez, the method surpassed pure RL baselines on math, code, and reasoning tasks. It achieved higher performance ceilings with fewer samples up to 32k, retained greater plasticity after initialization, and reduced catastrophic forgetting compared with RL-only and GEPA-only variants. Rohan Anil questioned how close standalone GEPA optimization on an RL-finetuned checkpoint comes to full FST results.

Original post

Can LLMs adapt continually without losing base skills? Fast-Slow Training (FST) pairs "slow" weights with "fast" context. FST vs. RL: • 3x more sample-efficient • Higher performance ceiling • Less KL drift (better plasticity) • Continual learning: succeeds where RL stalls

8:37 AM · May 13, 2026 View on X
Reposted by

@agarwl_ A question I had seeing graphs if you have run Gepa on the final RL’ed ckpt and seen the performance of it, how far from combined strategy would it be?

Rishabh AgarwalRishabh Agarwal@agarwl_

Training LLMs is synonymous with updating their weights. However, LLMs can also learn in-context using *frozen* weights. There is no good reason for restricting learning to being in-context or in-weights. So a natural idea is "Learning, Fast and Slow" (FST). In FST, slow learning is LLM weights trained with RL while fast learning is context / prompt (fast weights) optimized with GEPA. Compared to RL, FST performs better while being more data efficient, adaptable (plasticity), and forgetting less (stays closer to base models). I think this idea of learning both fast-slow weights would be a good foundation for continual learning. PS: Geoff Hinton (the OG) described the idea of fast weights and slow weights several years ago, and back then I remember thinking it's a very cool idea. See more details here: https://gepa-ai.github.io/gepa/blog/2026/05/11/learning-fast-and-slow/

12:23 AM · May 15, 2026 · 60.2K Views
1:28 AM · May 15, 2026 · 1.7K Views

Training LLMs is synonymous with updating their weights. However, LLMs can also learn in-context using *frozen* weights. There is no good reason for restricting learning to being in-context or in-weights.

So a natural idea is "Learning, Fast and Slow" (FST). In FST, slow learning is LLM weights trained with RL while fast learning is context / prompt (fast weights) optimized with GEPA.

Compared to RL, FST performs better while being more data efficient, adaptable (plasticity), and forgetting less (stays closer to base models).

I think this idea of learning both fast-slow weights would be a good foundation for continual learning.

PS: Geoff Hinton (the OG) described the idea of fast weights and slow weights several years ago, and back then I remember thinking it's a very cool idea.

See more details here: https://gepa-ai.github.io/gepa/blog/2026/05/11/learning-fast-and-slow/

12:23 AM · May 15, 2026 · 60.2K Views

@BlackHC Choosing violence today:

Hinton and Plaut, 1987.

toronto.edu
/~fritz/absps/fastweights.pdf
Andreas Kirsch 🇺🇦Andreas Kirsch 🇺🇦@BlackHC

@agarwl_ Actually pretty sure Schmidhuber coined fast weights in 1991 or so where you slow weights create fast weights 😅

7:02 AM · May 15, 2026 · 1.5K Views
12:41 PM · May 15, 2026 · 2.1K Views

@BlackHC For me, the inspiration was mostly two timescales of learning and the fact that not everything has to go to network weights. Rest is vibes.

Andreas Kirsch 🇺🇦Andreas Kirsch 🇺🇦@BlackHC

Nice! Thanks for sharing! I enjoy being schooled 🤓 Very interesting but also a very different concept and it doesn't create fast weights from slow weights like Schmidhuber's prior art that is equivalent to linear transformers, so it doesn't map to your concept of fast and slow learning via in-context and parameter updates?

2:05 PM · May 15, 2026 · 248 Views
2:35 PM · May 15, 2026 · 216 Views

@agarwl_ Actually pretty sure Schmidhuber coined fast weights in 1991 or so where you slow weights create fast weights 😅

Rishabh AgarwalRishabh Agarwal@agarwl_

Training LLMs is synonymous with updating their weights. However, LLMs can also learn in-context using *frozen* weights. There is no good reason for restricting learning to being in-context or in-weights. So a natural idea is "Learning, Fast and Slow" (FST). In FST, slow learning is LLM weights trained with RL while fast learning is context / prompt (fast weights) optimized with GEPA. Compared to RL, FST performs better while being more data efficient, adaptable (plasticity), and forgetting less (stays closer to base models). I think this idea of learning both fast-slow weights would be a good foundation for continual learning. PS: Geoff Hinton (the OG) described the idea of fast weights and slow weights several years ago, and back then I remember thinking it's a very cool idea. See more details here: https://gepa-ai.github.io/gepa/blog/2026/05/11/learning-fast-and-slow/

12:23 AM · May 15, 2026 · 60.2K Views
7:02 AM · May 15, 2026 · 1.5K Views

Nice! Thanks for sharing! I enjoy being schooled 🤓 Very interesting but also a very different concept and it doesn't create fast weights from slow weights like Schmidhuber's prior art that is equivalent to linear transformers, so it doesn't map to your concept of fast and slow learning via in-context and parameter updates?

Rishabh AgarwalRishabh Agarwal@agarwl_

@BlackHC Choosing violence today: Hinton and Plaut, 1987. https://www.cs.toronto.edu/~fritz/absps/fastweights.pdf

12:41 PM · May 15, 2026 · 2.1K Views
2:05 PM · May 15, 2026 · 248 Views
Fast-Slow Training outperforms RL baselines on math code tasks · Digg