3d ago

Fast-Slow Training triples RL sample efficiency

0

Fast-Slow Training interleaves a fast GEPA-based context optimization loop with slow reinforcement learning weight updates. On math, code, and physics tasks the method reaches three times the sample efficiency of standard RL alone. It attains higher performance ceilings, reduces KL drift from the base model, and maintains plasticity for continued learning.

Original post

Really excited about this work that combines GEPA with RL! You get some of the advantages of both, with reflection on rich feedback leading to better weight updates.

11:09 AM · May 13, 2026 View on X
Reposted by

Really excited about this work that combines GEPA with RL! You get some of the advantages of both, with reflection on rich feedback leading to better weight updates.

Kusha SareenKusha Sareen@KushaSareen

Can LLMs adapt continually without losing base skills? Fast-Slow Training (FST) pairs "slow" weights with "fast" context. FST vs. RL: • 3x more sample-efficient • Higher performance ceiling • Less KL drift (better plasticity) • Continual learning: succeeds where RL stalls

3:37 PM · May 13, 2026 · 113K Views
6:09 PM · May 13, 2026 · 21.1K Views
Matei ZahariaMatei Zaharia@matei_zaharia

Really excited about this work that combines GEPA with RL! You get some of the advantages of both, with reflection on rich feedback leading to better weight updates.

6:09 PM · May 13, 2026 · 21.1K Views
8:43 PM · May 14, 2026 · 96 Views

seems this method could implement the karpathy "small cognitive core" concept

Kusha SareenKusha Sareen@KushaSareen

Can LLMs adapt continually without losing base skills? Fast-Slow Training (FST) pairs "slow" weights with "fast" context. FST vs. RL: • 3x more sample-efficient • Higher performance ceiling • Less KL drift (better plasticity) • Continual learning: succeeds where RL stalls

3:37 PM · May 13, 2026 · 113K Views
5:48 AM · May 14, 2026 · 15 Views
Fast-Slow Training triples RL sample efficiency · Digg