6h ago

Dimitris Papailiopoulos from Microsoft Research shares ECHO method for training CLI agents with environment prediction loss

0

Dimitris Papailiopoulos from Microsoft Research AI Frontiers posted results on ECHO, which adds an environment prediction loss to standard GRPO training for command-line agents. The method trains on both agent actions and terminal responses in one rollout and forward pass instead of masking outputs. It delivers improved benchmark scores across Qwen3 models. Researcher John Langford noted that forecasting terminal command outputs accelerates reinforcement learning for agents operating in command-line environments.

Original post
Reposted by

@DimitrisPapail @ChengleiSi This is nice work, so sorry for distracting from the substance, but I'm genuinely curious:

> This work was done at AI Frontiers, a boutique research lab inside Microsoft Research.

What does "boutique research lab" mean here?

Dimitris PapailiopoulosDimitris Papailiopoulos@DimitrisPapail

http://x.com/i/article/2056344151235387392

1:38 PM · May 18, 2026 · 88K Views
4:22 PM · May 18, 2026 · 3K Views

@AlexGDimakis @DimitrisPapail @ChengleiSi Hehe

Alex DimakisAlex Dimakis@AlexGDimakis

@giffmana @DimitrisPapail @ChengleiSi Lucas, in a world of commodity models and scaled slop, a boutique research labs proposes something more deliciously bold: Think of Mozambique cashmere agents, asymmetrical overall environments and locally sourced world model custom losses.

6:07 PM · May 18, 2026 · 290 Views
6:18 PM · May 18, 2026 · 47 Views

@DimitrisPapail This work is so cool as always @DimitrisPapail , and you are too kind!!

Dimitris PapailiopoulosDimitris Papailiopoulos@DimitrisPapail

I'm just glad we did this before @lateinteraction and his amazing students :p

4:36 PM · May 18, 2026 · 656 Views
6:54 PM · May 18, 2026 · 75 Views

Improve your agents with one weird trick: ECHO says, when you SFT an agent, do not train it to predict only the agent replies, but also the terminal responses. When you GRPO, you use the same rollout to predict the terminal responses with cross entropy loss. Its basically free and gets extra supervision from the CLI. This apparently helps the model develop a 'world model' of the terminal, and improves performance, which was very surprising to me.

Dimitris PapailiopoulosDimitris Papailiopoulos@DimitrisPapail

http://x.com/i/article/2056344151235387392

1:38 PM · May 18, 2026 · 88K Views
5:26 PM · May 18, 2026 · 2.4K Views

@giffmana @DimitrisPapail @ChengleiSi Lucas, in a world of commodity models and scaled slop, a boutique research labs proposes something more deliciously bold: Think of Mozambique cashmere agents, asymmetrical overall environments and locally sourced world model custom losses.

Lucas Beyer (bl16)Lucas Beyer (bl16)@giffmana

@DimitrisPapail @ChengleiSi This is nice work, so sorry for distracting from the substance, but I'm genuinely curious: > This work was done at AI Frontiers, a boutique research lab inside Microsoft Research. What does "boutique research lab" mean here?

4:22 PM · May 18, 2026 · 3K Views
6:07 PM · May 18, 2026 · 290 Views

@NovaSkyAI here's a simple skyRL patch to train better CLI agents, for free

Dimitris PapailiopoulosDimitris Papailiopoulos@DimitrisPapail

http://x.com/i/article/2056344151235387392

1:38 PM · May 18, 2026 · 88K Views
1:47 PM · May 18, 2026 · 859 Views

@giffmana @ChengleiSi A small group, of talented people, that are given free space to explore ideas that matter in the broader scope of AI, and specifically the area of computer use agents, but don't cost 1M to test :)

Lucas Beyer (bl16)Lucas Beyer (bl16)@giffmana

@DimitrisPapail @ChengleiSi This is nice work, so sorry for distracting from the substance, but I'm genuinely curious: > This work was done at AI Frontiers, a boutique research lab inside Microsoft Research. What does "boutique research lab" mean here?

4:22 PM · May 18, 2026 · 3K Views
4:26 PM · May 18, 2026 · 1.1K Views

@giffmana @ChengleiSi I came up with the phrasing, because it reminds me of how I'd describe with two words DM in its early days. One can only hope to come approximately close to that intellectual and technical space.

Dimitris PapailiopoulosDimitris Papailiopoulos@DimitrisPapail

@giffmana @ChengleiSi A small group, of talented people, that are given free space to explore ideas that matter in the broader scope of AI, and specifically the area of computer use agents, but don't cost 1M to test :)

4:26 PM · May 18, 2026 · 1.1K Views
4:27 PM · May 18, 2026 · 423 Views

@giffmana @ChengleiSi also thanks for reading up to that part :D i know you have ton of cool stuff to work on today, so I'm grateful for your time.

Dimitris PapailiopoulosDimitris Papailiopoulos@DimitrisPapail

@giffmana @ChengleiSi I came up with the phrasing, because it reminds me of how I'd describe with two words DM in its early days. One can only hope to come approximately close to that intellectual and technical space.

4:27 PM · May 18, 2026 · 423 Views
4:28 PM · May 18, 2026 · 342 Views

I'm just glad we did this before @lateinteraction and his amazing students :p

Dimitris PapailiopoulosDimitris Papailiopoulos@DimitrisPapail

http://x.com/i/article/2056344151235387392

1:38 PM · May 18, 2026 · 88K Views
4:36 PM · May 18, 2026 · 656 Views

@AlexGDimakis @giffmana @ChengleiSi lol

Alex DimakisAlex Dimakis@AlexGDimakis

@giffmana @DimitrisPapail @ChengleiSi Lucas, in a world of commodity models and scaled slop, a boutique research labs proposes something more deliciously bold: Think of Mozambique cashmere agents, asymmetrical overall environments and locally sourced world model custom losses.

6:07 PM · May 18, 2026 · 290 Views
6:09 PM · May 18, 2026 · 181 Views

Turns out training your agent to be a world simulator improves its accuracy of solving problems

Yifu Qiu@ICLR 2026Yifu Qiu@ICLR 2026@yifuqiu98

Internalizing world modeling as a native ability for agents.

2:45 PM · May 18, 2026 · 8K Views
2:48 PM · May 18, 2026 · 8.5K Views

Very rarely you stumble on a method that's simple, obvious in hindsight, free, and touches on every problem you care about: CLI agents, continual learning, self-improvement, world models.

ECHO is one of those

Dimitris PapailiopoulosDimitris Papailiopoulos@DimitrisPapail

http://x.com/i/article/2056344151235387392

1:38 PM · May 18, 2026 · 88K Views
4:00 PM · May 18, 2026 · 25.1K Views

Lol you can continual learn by training on terminal outputs WITHOUT REWARDS

Dimitris PapailiopoulosDimitris Papailiopoulos@DimitrisPapail

http://x.com/i/article/2056344151235387392

1:38 PM · May 18, 2026 · 88K Views
1:50 PM · May 18, 2026 · 5.4K Views

Prediction: by end of 2026 Echo will be part of standard agent RL trainers.

FREE LUNCH FOR EVERYONE

Dimitris PapailiopoulosDimitris Papailiopoulos@DimitrisPapail

http://x.com/i/article/2056344151235387392

1:38 PM · May 18, 2026 · 88K Views
1:43 PM · May 18, 2026 · 4.2K Views

World modeling. Faster RL. Self-improvement without verifiers.

All from one extra loss term on your favorite open-weights CLI agent.

Happy Monday!

Dimitris PapailiopoulosDimitris Papailiopoulos@DimitrisPapail

http://x.com/i/article/2056344151235387392

1:38 PM · May 18, 2026 · 88K Views
1:41 PM · May 18, 2026 · 22.7K Views

@ChenhaoTan Thanks for checking out! I agree. You don't get too many of those in your career, so happy we stumbled upon it

Chenhao TanChenhao Tan@ChenhaoTan

Always a good sign that you are surprised that something has not been done before!

3:47 PM · May 18, 2026 · 828 Views
3:48 PM · May 18, 2026 · 88 Views

A fun result: training to predict terminal output significantly accelerates RL for terminal agents.

Dimitris PapailiopoulosDimitris Papailiopoulos@DimitrisPapail

http://x.com/i/article/2056344151235387392

1:38 PM · May 18, 2026 · 88K Views
2:12 PM · May 18, 2026 · 1.5K Views

god what a beautiful objective. i wonder how general you can push this. best non-distillation answer ive seen for knowledge acq during RL, feels bitter-pilled in a way that most self-teaching methods aren’t.

Dimitris PapailiopoulosDimitris Papailiopoulos@DimitrisPapail

http://x.com/i/article/2056344151235387392

1:38 PM · May 18, 2026 · 88K Views
8:07 PM · May 18, 2026 · 213 Views

Always a good sign that you are surprised that something has not been done before!

Dimitris PapailiopoulosDimitris Papailiopoulos@DimitrisPapail

http://x.com/i/article/2056344151235387392

1:38 PM · May 18, 2026 · 88K Views
3:47 PM · May 18, 2026 · 828 Views

How do machines build a mental map of reality? 🧠

Check out this frontier investigation into *world models* from our team at @ms_aifrontiers. Proud to see @DimitrisPapail and colleagues pushing the boundaries of how we think about AI reasoning.

Dimitris PapailiopoulosDimitris Papailiopoulos@DimitrisPapail

World modeling. Faster RL. Self-improvement without verifiers. All from one extra loss term on your favorite open-weights CLI agent. Happy Monday!

1:41 PM · May 18, 2026 · 22.7K Views
4:26 PM · May 18, 2026 · 2.8K Views

@DimitrisPapail 🫪

Dimitris PapailiopoulosDimitris Papailiopoulos@DimitrisPapail

World modeling. Faster RL. Self-improvement without verifiers. All from one extra loss term on your favorite open-weights CLI agent. Happy Monday!

1:41 PM · May 18, 2026 · 22.7K Views
4:51 PM · May 18, 2026 · 206 Views

@DimitrisPapail great work @DimitrisPapail

Dimitris PapailiopoulosDimitris Papailiopoulos@DimitrisPapail

http://x.com/i/article/2056344151235387392

1:38 PM · May 18, 2026 · 88K Views
6:18 PM · May 18, 2026 · 108 Views

Wonderful. The terminal is the world to an agent. It learns to model the world

Dimitris PapailiopoulosDimitris Papailiopoulos@DimitrisPapail

Very rarely you stumble on a method that's simple, obvious in hindsight, free, and touches on every problem you care about: CLI agents, continual learning, self-improvement, world models. ECHO is one of those

4:00 PM · May 18, 2026 · 25.1K Views
4:21 PM · May 18, 2026 · 3.6K Views

FYI, I will bet my last nickel this is part of Amthropics secret sauce

Super DarioSuper Dario@inductionheads

Wonderful. The terminal is the world to an agent. It learns to model the world

4:21 PM · May 18, 2026 · 3.6K Views
7:22 PM · May 18, 2026 · 628 Views
Dimitris Papailiopoulos from Microsoft Research shares ECHO method for training CLI agents with environment prediction loss · Digg