/AI7h ago

Microsoft Researchers Use ECHO to Train SLMs via RL Exploration Alone

0252294.9K

#193

Original post

Dimitris Papailiopoulos#193

Aradhye Agarwal@AradhyeAgarwal

This is actually pretty interesting and surprisingly useful.

Context: Some of us from MSFTResearch were planning to train an SLM with a fixed harness (essentially optimising the model for the harness).

A valid point raised by one person was, we'll need coding data in addition to RL environments since the world knowledge of SLMs is poor due to size constraints. This would be problematic since the model would not know what libraries to call / how to use them.

I realised the inherent problem in RL is that the model learns 0 world knowledge. This is because the output tokens are masked so no external signal gets internalised within the model.

A very simple fix we're thinking of doing is using ECHO, due to which we can get away with just RL environments (since the model will learn the library behaviour via its exploration).

Dimitris Papailiopoulos@DimitrisPapail

http://x.com/i/article/2056344151235387392

1:41 AM · Jun 7, 2026 · 4.9K Views

/AI7h ago

Microsoft Researchers Use ECHO to Train SLMs via RL Exploration Alone

0252294.9K

#193

Original post

Dimitris Papailiopoulos#193

Aradhye Agarwal@AradhyeAgarwal

This is actually pretty interesting and surprisingly useful.

Context: Some of us from MSFTResearch were planning to train an SLM with a fixed harness (essentially optimising the model for the harness).

I realised the inherent problem in RL is that the model learns 0 world knowledge. This is because the output tokens are masked so no external signal gets internalised within the model.

A very simple fix we're thinking of doing is using ECHO, due to which we can get away with just RL environments (since the model will learn the library behaviour via its exploration).

Dimitris Papailiopoulos@DimitrisPapail

http://x.com/i/article/2056344151235387392

1:41 AM · Jun 7, 2026 · 4.9K Views

Sentiment

Sentiment building, check back later.

Cluster Engagement

Posts from X

Most Activity

No ranked X posts are available for this story yet.