Happy to see more RL systems moving toward this deployment shape.
This has been one of the core ideas behind AstraFlow since our early design: large-scale agentic RL should move beyond trainer-centered “engine mode” and toward independently managed rollout/inference and training systems, connected by a clean rollout + weight-sync contract.
In AstraFlow (https://github.com/Infini-AI-Lab/astraflow), we have been building toward this direction through rollout/trainer service decoupling, bring-your-own rollout service, flexible dataflow, and support for heterogeneous rollout backends.
Excited to see the broader community converging on this architecture. I believe this is where large-scale agentic RL infrastructure is heading.