/AI7h ago

Microsoft Researchers Use ECHO to Train SLMs via RL Exploration Alone

0252294.9K
Aradhye Agarwal@AradhyeAgarwal

This is actually pretty interesting and surprisingly useful.

Context: Some of us from MSFTResearch were planning to train an SLM with a fixed harness (essentially optimising the model for the harness).

A valid point raised by one person was, we'll need coding data in addition to RL environments since the world knowledge of SLMs is poor due to size constraints. This would be problematic since the model would not know what libraries to call / how to use them.

I realised the inherent problem in RL is that the model learns 0 world knowledge. This is because the output tokens are masked so no external signal gets internalised within the model.

A very simple fix we're thinking of doing is using ECHO, due to which we can get away with just RL environments (since the model will learn the library behaviour via its exploration).

http://x.com/i/article/2056344151235387392

1:41 AM · Jun 7, 2026 · 4.9K Views
Sentiment
Sentiment building, check back later.
Cluster Engagement
Posts from X
Most Activity
Most Activity
No ranked X posts are available for this story yet.