1d ago

Fast-slow training pairs slow reinforcement learning weight updates with fast GEPA prompt optimization to outperform standard training on math, code, and reasoning tasks

Approach uses less data while preserving plasticity and reducing forgetting.

0