SWE-bench creator John Yang opens public submissions for ProgramBench, which tests whether language models can rebuild programs from scratch · Digg

SWE-bench creator John Yang opens public submissions for ProgramBench, which tests whether language models can rebuild programs from scratch · Digg

Posts from X

Most Activity

VIEWS402BOOKMARKS1LIKES5REPLIES2

John Yang@jyangballin

- Tutorial: https://programbench.com/blog/submission-guide/ - Submissions repository: https://github.com/ProgramBench/submissions

- mini-SWE-agent runs: https://github.com/orgs/ProgramBench/repositories - ProgramBench repo: https://github.com/facebookresearch/ProgramBench

ProgramBench is a joint effort across Meta FAIR, Meta TBD, Stanford, Harvard @KLieret (co-first author) @18jeffreyma @parth007_96 @dpedch @sten_sootla @micmylin @pengchengyin @magpie_rayhou @syhw @Diyi_Yang @OfirPress

John Yang@jyangballin

We’re really excited about the adoption of ProgramBench (Anthropic, GLM, Kimi). Looking forward to supporting investigations of long-horizon coding at scale.

Expect more models, more harnesses, multi-agent approaches, and much more in the coming months!

7h40251

RETWEETS3

Kilian Lieret@KLieret

Multiagents never quite gained a foothold on SWE-Bench. ProgramBench might be different. Looking forward to a new scaffold engineering race!

John Yang@jyangballin

ProgramBench is now accepting submissions to the official leaderboard!

We also open sourced *all* of the 2000+ agent trajectories and codebases from our paper’s main results.

6h1.1K101

John Yang@jyangballin

We wrote a short guide on how to create a ProgramBench submission. tl;dr:

1. `programbench submit package` 2. Fill out some metadata 3. `programbench submit publish` 4. `programbench submit register`

Ensures results are fully transparent and reproducible.

John Yang@jyangballin

ProgramBench is now accepting submissions to the official leaderboard!

We also open sourced *all* of the 2000+ agent trajectories and codebases from our paper’s main results.

7h23030

John Yang@jyangballin

We’re really excited about the adoption of ProgramBench (Anthropic, GLM, Kimi). Looking forward to supporting investigations of long-horizon coding at scale.

Expect more models, more harnesses, multi-agent approaches, and much more in the coming months!

John Yang@jyangballin

We wrote a short guide on how to create a ProgramBench submission. tl;dr:

1. `programbench submit package` 2. Fill out some metadata 3. `programbench submit publish` 4. `programbench submit register`

Ensures results are fully transparent and reproducible.

7h23310

Florian Brand@xeophon

@jyangballin @KLieret Exciting!!

John Yang@jyangballin

ProgramBench is now accepting submissions to the official leaderboard!

We also open sourced *all* of the 2000+ agent trajectories and codebases from our paper’s main results.

7h12840