/Tech2h ago

Prime Intellect releases ECHO, a training framework combining RL and SFT to help agents predict tool-call outcomes

Story Overview

Prime Intellect describes ECHO as a hybrid method that layers supervised fine-tuning on environment observations directly into reinforcement learning loops, letting agents practice forecasting the results of tool calls instead of relying solely on reward signals.

723888519.2K
Original post
samsja@samsja19

Very exciting work to bridge the gap between RL and mid/pretraining

You can learn from your environment beyond the reward signal by doing next token prediction on some of your tool call output

Prime Intellect@PrimeIntellect

True agents model the world.

Current training provides no separation between agent and environment: pre-training only trains world modeling, RL only agentic actions. We combine both using ECHO by @DimitrisPapail and @VaishShrivas.

2:25 PM · Jun 10, 2026 · 13.2K Views
Developer Impact

Blending Pretraining with Agent Actions

The approach runs next-token prediction on tool-response tokens while still optimizing the assistant and tool-call tokens via RL, which the post suggests could close the usual gap between world modeling and action selection.

Open Question

Release Details Stay Sparse

No code, weights, or integration timeline appears in the blog, and it remains unclear whether ECHO components will land in the prime-rl repo or stay internal for now.

Sentiment

Users appreciate Prime Intellect's ECHO technique for merging RL and pretraining as a cool and sensible approach that makes sense information-theoretically while praising the effort behind the experiments.

Pos
100.0%
Neg
0.0%
6 comments with sentiment.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS649BOOKMARKS6LIKES13REPLIES1
elie@eliebakouch

early results on how to use tool call result to make agent model the world

amazing work (and blog) by @omouamoua

https://www.primeintellect.ai/blog/true-agents-model-the-world

Prime Intellect@PrimeIntellect

True agents model the world.

Current training provides no separation between agent and environment: pre-training only trains world modeling, RL only agentic actions. We combine both using ECHO by @DimitrisPapail and @VaishShrivas.

36mViews 649Likes 13Bookmarks 6
RETWEETS3

"ECHO is a promising technique that seems to work at scale. We believe that it will soon become an important part of open model training [...] Therefore, we will soon support it in a highly flexible and performant manner in prime-rl."

Thank you @PrimeIntellect ❤️

Prime Intellect@PrimeIntellect

True agents model the world.

Current training provides no separation between agent and environment: pre-training only trains world modeling, RL only agentic actions. We combine both using ECHO by @DimitrisPapail and @VaishShrivas.

13hViews 5.6KLikes 92Bookmarks 20
hallerite@hallerite

@DimitrisPapail @PrimeIntellect it just makes sense if you take an information theoretic view on RL. we need to squeeze out as much signal as we can from each rollout :)

12hViews 126Likes 8Bookmarks 1
Tim Kostolansky@thkostolansky

@DimitrisPapail @PrimeIntellect its cool work sir

12hViews 108Likes 6

@thkostolansky @PrimeIntellect I appreciate more than anything the patience, focus, and effort that these experiments require, irrespective of the final outcome. Thank you for contributing to open research and for overall being freaking awesome.

12hViews 94Likes 6
Mariusz Kurman@mkurman88

@samsja19 bruh Feb 12, 2025: https://github.com/mkurman/grpo-llm-evaluator

12hViews 121Likes 1Bookmarks 1
Shuozhe Li@ShuozheL

@DimitrisPapail @PrimeIntellect Glad see people made same finding. Recommend to read our work from last years https://arxiv.org/abs/2507.02834v3 We actually find that the data used for SFT term better to be "in-distribution" and generated by the learning model with extra hint so the policy gradient is more effective.

1hViews 10
Tim Kostolansky@thkostolansky

@DimitrisPapail @PrimeIntellect 💞

12hViews 26Likes 1
Andrea Miele@andreamiele_

@DimitrisPapail @PrimeIntellect Could you please release the training set for the ECHO paper ? it’s a really nice post training recipe :)

13hViews 68
Lunari@0x_lun

@eliebakouch @omouamoua the key insight is that pure rl leaves tool responses masked so the model never learns to predict its own environment

unmasking those tokens at even 0.05 weight is basically free and changes what the model can plan around

32mViews 5
Shuozhe Li@ShuozheL

@DimitrisPapail @PrimeIntellect This also helps the model to learn to solve harder questions which model has hard time to sample correct answer.

1hViews 3