Exciting work! But in our February paper, "Reinforcement Learning with Text Feedback", we proposed the same methodology: predicting environment feedback on top of the RL loss. Nice to see this idea specialized to agentic terminal tasks, and the new insight this brings 💡. [1/2]