True agents model the world.
Current training provides no separation between agent and environment: pre-training only trains world modeling, RL only agentic actions. We combine both using ECHO by @DimitrisPapail and @VaishShrivas.
The technique runs next-token prediction on agent tool calls
True agents model the world.
Current training provides no separation between agent and environment: pre-training only trains world modeling, RL only agentic actions. We combine both using ECHO by @DimitrisPapail and @VaishShrivas.
Users are glad the ECHO method combining pre-training and RL for world-modeling agents was published, praising it as a nice recipe with promising implications.
Very exciting work to bridge the gap between RL and mid/pretraining
You can learn from your environment beyond the reward signal by doing next token prediction on some of your tool call output
True agents model the world.
Current training provides no separation between agent and environment: pre-training only trains world modeling, RL only agentic actions. We combine both using ECHO by @DimitrisPapail and @VaishShrivas.
"ECHO is a promising technique that seems to work at scale. We believe that it will soon become an important part of open model training [...] Therefore, we will soon support it in a highly flexible and performant manner in prime-rl."
Thank you @PrimeIntellect ❤️
True agents model the world.
Current training provides no separation between agent and environment: pre-training only trains world modeling, RL only agentic actions. We combine both using ECHO by @DimitrisPapail and @VaishShrivas.

We show strong results in the under-resourced programming language Forth and evaluate generalization to unrelated environments.
We also characterize what aspects of an environment lead to overfitting when using ECHO, how model behavior is impacted, and much more.

Read more:
https://www.primeintellect.ai/blog/true-agents-model-the-world/

By performing SFT on tool outputs and RL on the assistant tokens, we can efficiently teach the model the environment dynamics. This happens on-policy: the LLM models the environment not in a vacuum but in response to its own actions.
love to see it!
True agents model the world.
Current training provides no separation between agent and environment: pre-training only trains world modeling, RL only agentic actions. We combine both using ECHO by @DimitrisPapail and @VaishShrivas.
Great blogpost from prime, envs are about gain a whole new side usecase
True agents model the world.
Current training provides no separation between agent and environment: pre-training only trains world modeling, RL only agentic actions. We combine both using ECHO by @DimitrisPapail and @VaishShrivas.

@DimitrisPapail @PrimeIntellect Could you please release the training set for the ECHO paper ? it’s a really nice post training recipe :)

@PrimeIntellect @DimitrisPapail @VaishShrivas I'm so glad someone finally tried this out and published the results, thank you! This has big implications if scaled up sufficiently.
The technique runs next-token prediction on agent tool calls
True agents model the world.
Current training provides no separation between agent and environment: pre-training only trains world modeling, RL only agentic actions. We combine both using ECHO by @DimitrisPapail and @VaishShrivas.