Fast-Slow Training triples RL sample efficiency
Fast-Slow Training interleaves a fast GEPA-based context optimization loop with slow reinforcement learning weight updates. On math, code, and physics tasks the method reaches three times the sample efficiency of standard RL alone. It attains higher performance ceilings, reduces KL drift from the base model, and maintains plasticity for continued learning.
Really excited about this work that combines GEPA with RL! You get some of the advantages of both, with reflection on rich feedback leading to better weight updates.
Can LLMs adapt continually without losing base skills? Fast-Slow Training (FST) pairs "slow" weights with "fast" context. FST vs. RL: • 3x more sample-efficient • Higher performance ceiling • Less KL drift (better plasticity) • Continual learning: succeeds where RL stalls
@matei_zaharia Converting recurring reasoning traces into skills and using them adaptively as in-context
Really excited about this work that combines GEPA with RL! You get some of the advantages of both, with reflection on rich feedback leading to better weight updates.
seems this method could implement the karpathy "small cognitive core" concept
Can LLMs adapt continually without losing base skills? Fast-Slow Training (FST) pairs "slow" weights with "fast" context. FST vs. RL: • 3x more sample-efficient • Higher performance ceiling • Less KL drift (better plasticity) • Continual learning: succeeds where RL stalls