Microsoft Researchers Use ECHO to Train SLMs via RL Exploration Alone · Digg