Very exciting work to bridge the gap between RL and mid/pretraining
You can learn from your environment beyond the reward signal by doing next token prediction on some of your tool call output
True agents model the world.
Current training provides no separation between agent and environment: pre-training only trains world modeling, RL only agentic actions. We combine both using ECHO by @DimitrisPapail and @VaishShrivas.






