prime-rl can now train 1T parameters MoE blazingly fast, under 5 minutes per step, or 1k steps in ~3 days
To achieve this we shipped in our latest prime-rl 0.6.0:
* inference: wide-ep, fp8 inference, llm-d router, mooncake, kv cache cpu offloading
* training: fsdp2, deep-ep expert parallelism, dsa cp, fp8 training, router replay
* agentic rollout: we rewrote the core of our rollout orchestrator for better scalability
support for glm5, kimi, nemotron, ...,
prime-rl is open source but also end to end optimized to run on our dedicated RL infra and compute layer