3d ago

Fast-Slow Training triples RL sample efficiency

142896823042.6K

——0——

Fast-Slow Training interleaves a fast GEPA-based context optimization loop with slow reinforcement learning weight updates. On math, code, and physics tasks the method reaches three times the sample efficiency of standard RL alone. It attains higher performance ceilings, reduces KL drift from the base model, and maintains plasticity for continued learning.

Original post

Matei Zaharia#338@MATEI_ZAHARIA

Really excited about this work that combines GEPA with RL! You get some of the advantages of both, with reflection on rich feedback leading to better weight updates.

11:09 AM · May 13, 2026

Cluster Engagement

Engagement snapshots are unavailable for this cluster.no post metric buckets

Reposted by

#303@NILOOFAR_MIRE

QUOTE POST

#338Matei Zaharia@MATEI_ZAHARIA

Really excited about this work that combines GEPA with RL! You get some of the advantages of both, with reflection on rich feedback leading to better weight updates.

Kusha Sareen@KushaSareen

Can LLMs adapt continually without losing base skills? Fast-Slow Training (FST) pairs "slow" weights with "fast" context. FST vs. RL: • 3x more sample-efficient • Higher performance ceiling • Less KL drift (better plasticity) • Continual learning: succeeds where RL stalls

3:37 PM · May 13, 2026 · 113K Views

6:09 PM · May 13, 2026 · 21.1K Views

#901Anirudh Goyal@ANIRUDHG9119

@matei_zaharia Converting recurring reasoning traces into skills and using them adaptively as in-context

arxiv.org

Metacognitive Reuse: Turning Recurring LLM Reasoning Into Concise Behaviors

Large language models (LLMs) now solve multi-step problems by emitting extended chains of thought. During the process, they often re-derive the same intermediate steps across problems, inflating...

Matei Zaharia@matei_zaharia

Really excited about this work that combines GEPA with RL! You get some of the advantages of both, with reflection on rich feedback leading to better weight updates.

6:09 PM · May 13, 2026 · 21.1K Views

8:43 PM · May 14, 2026 · 96 Views

QUOTE POST

#1882murat 🍥@MAYFER

seems this method could implement the karpathy "small cognitive core" concept

Kusha Sareen@KushaSareen

3:37 PM · May 13, 2026 · 113K Views

5:48 AM · May 14, 2026 · 15 Views