Learning Fast and Slow triples LLM sample efficiency
Learning Fast and Slow (FST) trains large language models by pairing slow reinforcement learning weight updates with fast prompt and context optimization on frozen base models. The method uses mechanisms such as GEPA for rapid in-context adaptation. It delivers three times greater sample efficiency than standard reinforcement learning, higher performance ceilings, reduced KL drift, and improved resistance to catastrophic forgetting while preserving plasticity for later tasks. The work references gain-tuning from the paper Adaptive denoising via GainTuning.