FrontierCS releases FrontierSmith for open-ended coding data
FrontierCS released FrontierSmith, a pipeline that starts with closed-ended coding problems, applies mutation to produce open-ended variants, filters outputs, and builds runnable optimization environments. Models trained on FrontierSmith data outperform those trained on human-curated open-ended datasets across FrontierCS benchmarks, ALE-bench, and KernelBench evaluations measuring long-horizon coding agent performance.
Love the name :)
Open-ended coding training data may no longer be the bottleneck: AI can scale open-ended tasks—and even outperform human-expert curation. FrontierCS team is releasing FrontierSmith: a system for synthesizing open-ended coding problems at scale. Starting from closed-ended coding tasks, FrontierSmith mutates, filters, and builds runnable optimization environments for long-horizon coding agents. In our experiments, FrontierSmith data trains stronger models than human-curated open-ended data on FrontierCS and ALE-bench. Blog: https://frontier-cs.org/blog/frontiersmith/ Paper: https://arxiv.org/abs/2605.14445 Code: https://github.com/FrontierCS/FrontierSmith Model: https://huggingface.co/runyuanhe/qwen35-9b-frontiersmith
There is a very interesting idea in this paper: how to judge if an optimization problem created by an LLM is ‘interesting’ or ‘valuable’ ? The proposed measure is called *idea divergence* : asks llms to solve the task multiple times and measures how many different strategies are used and perform well. We could not measure such solution diversity objectively before LLMs, but now we can easily get it with prompting.
Open-ended coding training data may no longer be the bottleneck: AI can scale open-ended tasks—and even outperform human-expert curation. FrontierCS team is releasing FrontierSmith: a system for synthesizing open-ended coding problems at scale. Starting from closed-ended coding tasks, FrontierSmith mutates, filters, and builds runnable optimization environments for long-horizon coding agents. In our experiments, FrontierSmith data trains stronger models than human-curated open-ended data on FrontierCS and ALE-bench. Blog: https://frontier-cs.org/blog/frontiersmith/ Paper: https://arxiv.org/abs/2605.14445 Code: https://github.com/FrontierCS/FrontierSmith Model: https://huggingface.co/runyuanhe/qwen35-9b-frontiersmith