1d ago

DSGym framework for evaluating and training data science agents accepted to ICML 2026 reveals that many existing benchmarks allow shortcut solutions without actual analysis

Releases DSGym-Tasks ecosystem filtering flawed benchmarks and expanding scientific coverage.

151803514525.3K

——0——

Original post

#690@JAMES_Y_ZOUOP

Fan Nie@FANNIE1208

🚀 Excited to share that #DSGym has been accepted to ICML 2026! DSGym is a holistic, unified framework for evaluating and training data science agents with standardized abstractions and a modular architecture for adding tasks, agent scaffolds, and tools. In this work, we: 🔍 Show that existing data science benchmarks are vulnerable to shortcuts: agents can often solve tasks without using the actual data. 📊 Release DSGym-Tasks, a curated task ecosystem that standardizes and audits representative benchmarks, filters shortcut-solvable tasks, and expands coverage with new scientific tasks. ⚡ Use DSGym for execution-grounded trajectory synthesis: with only 2K samples, we train a 4B model that outperforms GPT-4o on standardized data analysis benchmarks. 📄 Paper: https://arxiv.org/abs/2601.16344 💻 Code: https://github.com/fannie1208/DSGym 🤗 Dataset: https://huggingface.co/DSGym 🧵👇

4:48 PM · May 18, 2026

Reposted by

#1229@LUPANTECH

QUOTE POST

#690James Zou@JAMES_Y_ZOU

The best gym for data science💪: #DSGym provides a grounded and realistic environment to train and test data science agents.

Accepted to #ICML2026! Great work by @FanNie1208 @JunlinWang3 @_harperhua @federicobianchy @ykwon_0407 @ZhentingQi @oq_35 @ShangZhu18 @togethercompute

Fan Nie@FanNie1208

11:48 PM · May 18, 2026 · 21.7K Views

2:24 PM · May 19, 2026 · 4.2K Views

DSGym framework for evaluating and training data science agents accepted to ICML 2026 reveals that many existing benchmarks allow shortcut solutions without actual analysis

Cluster engagement

Sentiment