This is a very good entry into post training LLMs with RL. The whole recipe and data is open. Highly recommend!
Trained some terminal agents with friends!
Introducing Tmax, open RL terminal agent models. Under default settings and shorter length (65k) token budgets, tmax outperforms prior open work on terminal use. We are releasing all data+weights+rollouts publically!
