ADTop post: @AlexGDimakis “We are excited to announce what we have been working on for more than six months: The OpenThoughts-Agent dataset and OpenThinker-agent models. More than 100 ablations on data curation for RL environments for coding agents. Our data recipe is SOTA over all open-data agents in their class. We post-train a Qwen-3-32B to get 26% on Terminal Bench and open all our training sets, data pipelines, experiments and models. Some lessons we learned for training agents vs reasoning: 1. The Diversity of tasks matters more, compared to reasoning (OpenThoughts-Agent vs OpenThoughts). You could teach reasoning from math and it transfered widely but RL environments seem to teach more specific capabilities, so each domain must be covered. 2. Filtering high quality and hard questions remains very important. (Was also true for OpenThoughts reasoning). We discuss several ways of filtering. 3. Synthetic re-writing and task augmentation didn’t give significant benefits in our experiments. Sampling multiple teacher rollouts per task did work (was also true for reasoning). Even when keeping the dataset size fixed, multiple answers gave benefits. The Multiple answers mystery is still valid for agentic environments. 4. Stronger models are not necessarily better teachers (was also true for reasoning). The stronger teacher for Quen-3 was GLM-4.7-AWQ and the Terminus2 harness in Daytona. We are releasing 100k tasks and trajectories. 5.Benefits from GRPO remain limited and still on-going. I currently officially hate GRPO.”