3d ago

Kevin Li releases largest open agentic trace dataset

0

Kevin Li, an MS CS student at Stanford, released SWE-ZERO-12M-trajectories, the largest open agentic trace dataset. The collection holds 12 million trajectories and 112 billion tokens drawn from 122,000 pull requests across 3,000 repositories in 16 programming languages. It is 5.7 times larger than the previous leading open dataset and is hosted on Hugging Face. Researchers including Percy Liang and David Hall reposted the announcement.

Original post

Introducing SWE-ZERO-12M-trajectories: the largest agentic trace dataset in the open, 5.7x larger than the previous largest. 112B tokens · 12M trajectories · 122K PRs · 3K repos · 16 languages https://huggingface.co/datasets/AlienKevin/SWE-ZERO-12M-trajectories

9:33 AM · May 13, 2026 View on X
Reposted by

Going into the next Marin run.

Kevin LiKevin Li@kevin_x_li

Introducing SWE-ZERO-12M-trajectories: the largest agentic trace dataset in the open, 5.7x larger than the previous largest. 112B tokens · 12M trajectories · 122K PRs · 3K repos · 16 languages https://huggingface.co/datasets/AlienKevin/SWE-ZERO-12M-trajectories

4:33 PM · May 13, 2026 · 47.3K Views
4:47 PM · May 13, 2026 · 19.8K Views

SWE-ZERO uses @KLieret and I's mini-SWE-agent as the agent harness

Absolutely mindblowing the scale at which SWE-bench data has scaled.

A year ago, SWE-smith's 50k instances across 128 repos. @kevin_x_li's SWE-Zero blows this out the water. 👏

Kevin LiKevin Li@kevin_x_li

Introducing SWE-ZERO-12M-trajectories: the largest agentic trace dataset in the open, 5.7x larger than the previous largest. 112B tokens · 12M trajectories · 122K PRs · 3K repos · 16 languages https://huggingface.co/datasets/AlienKevin/SWE-ZERO-12M-trajectories

4:33 PM · May 13, 2026 · 47.3K Views
3:31 PM · May 14, 2026 · 3.2K Views
Kevin Li releases largest open agentic trace dataset · Digg