/Tech2h ago

Prime Intellect releases ECHO, a training framework combining RL and SFT to help agents predict tool-call outcomes

Story Overview

Prime Intellect describes ECHO as a hybrid method that layers supervised fine-tuning on environment observations directly into reinforcement learning loops, letting agents practice forecasting the results of tool calls instead of relying solely on reward signals.

723888519.2K

#743

Original post

samsja@samsja19

Very exciting work to bridge the gap between RL and mid/pretraining

You can learn from your environment beyond the reward signal by doing next token prediction on some of your tool call output

Prime Intellect@PrimeIntellect

True agents model the world.

Current training provides no separation between agent and environment: pre-training only trains world modeling, RL only agentic actions. We combine both using ECHO by @DimitrisPapail and @VaishShrivas.

2:25 PM · Jun 10, 2026 · 13.2K Views

/Tech2h ago

Prime Intellect releases ECHO, a training framework combining RL and SFT to help agents predict tool-call outcomes

Story Overview

723888519.2K

#743

Original post

samsja@samsja19

Very exciting work to bridge the gap between RL and mid/pretraining

You can learn from your environment beyond the reward signal by doing next token prediction on some of your tool call output

Prime Intellect@PrimeIntellect

True agents model the world.

2:25 PM · Jun 10, 2026 · 13.2K Views

Developer Impact

Blending Pretraining with Agent Actions

The approach runs next-token prediction on tool-response tokens while still optimizing the assistant and tool-call tokens via RL, which the post suggests could close the usual gap between world modeling and action selection.

Open Question

Release Details Stay Sparse

No code, weights, or integration timeline appears in the blog, and it remains unclear whether ECHO components will land in the prime-rl repo or stay internal for now.

Sentiment

Users appreciate Prime Intellect's ECHO technique for merging RL and pretraining as a cool and sensible approach that makes sense information-theoretically while praising the effort behind the experiments.

Pos

100.0%

Neg

0.0%

6 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

VIEWS649BOOKMARKS6LIKES13REPLIES1

elie@eliebakouch

early results on how to use tool call result to make agent model the world

amazing work (and blog) by @omouamoua

https://www.primeintellect.ai/blog/true-agents-model-the-world

Prime Intellect@PrimeIntellect

True agents model the world.

36m649136

RETWEETS3

Dimitris Papailiopoulos@DimitrisPapail

"ECHO is a promising technique that seems to work at scale. We believe that it will soon become an important part of open model training [...] Therefore, we will soon support it in a highly flexible and performant manner in prime-rl."

Thank you @PrimeIntellect ❤️

Prime Intellect@PrimeIntellect

True agents model the world.

13h5.6K9220

hallerite@hallerite

@DimitrisPapail @PrimeIntellect it just makes sense if you take an information theoretic view on RL. we need to squeeze out as much signal as we can from each rollout :)

12h12681

Tim Kostolansky@thkostolansky

@DimitrisPapail @PrimeIntellect its cool work sir

12h1086

Dimitris Papailiopoulos@DimitrisPapail

@thkostolansky @PrimeIntellect I appreciate more than anything the patience, focus, and effort that these experiments require, irrespective of the final outcome. Thank you for contributing to open research and for overall being freaking awesome.

12h946

Mariusz Kurman@mkurman88

@samsja19 bruh Feb 12, 2025: https://github.com/mkurman/grpo-llm-evaluator

12h12111

Shuozhe Li@ShuozheL

@DimitrisPapail @PrimeIntellect Glad see people made same finding. Recommend to read our work from last years https://arxiv.org/abs/2507.02834v3 We actually find that the data used for SFT term better to be "in-distribution" and generated by the learning model with extra hint so the policy gradient is more effective.

1h10

Tim Kostolansky@thkostolansky

@DimitrisPapail @PrimeIntellect 💞

12h261

Andrea Miele@andreamiele_

@DimitrisPapail @PrimeIntellect Could you please release the training set for the ECHO paper ? it’s a really nice post training recipe :)

13h68

Lunari@0x_lun

@eliebakouch @omouamoua the key insight is that pure rl leaves tool responses masked so the model never learns to predict its own environment

unmasking those tokens at even 0.05 weight is basically free and changes what the model can plan around

32m5

Shuozhe Li@ShuozheL

@DimitrisPapail @PrimeIntellect This also helps the model to learn to solve harder questions which model has hard time to sample correct answer.

1h3