SWE-bench creator John Yang opens public submissions for ProgramBench, which tests whether language models can rebuild programs from scratch
The launch open-sources 2,000 agent trajectories to enable replication
Users are excited about ProgramBench gaining adoption from labs like Anthropic, GLM and Kimi because it opens opportunities to investigate long-horizon coding.
No Digg Deeper questions have been answered for this story yet.