SGLang lead developer Banghua Zhu advocates decoupling training systems from rollout and inference services to support large-scale agentic RL

Original post

Happy to see more RL systems moving toward this deployment shape.

This has been one of the core ideas behind AstraFlow since our early design: large-scale agentic RL should move beyond trainer-centered “engine mode” and toward independently managed rollout/inference and training systems, connected by a clean rollout + weight-sync contract.

In AstraFlow (https://github.com/Infini-AI-Lab/astraflow), we have been building toward this direction through rollout/trainer service decoupling, bring-your-own rollout service, flexible dataflow, and support for heterogeneous rollout backends.

Excited to see the broader community converging on this architecture. I believe this is where large-scale agentic RL infrastructure is heading.

8:20 AM · Jun 5, 2026 · 7.6K Views