/AI19h ago

Researchers Launch ProgramBench For Flexible Whole-Repository Code Generation

214184.8K
Original post
Ofir Press@OfirPress#72inAI

ProgramBench is the first whole-repository-generation benchmark that also allows agents to pick *which* language they're going to use and *how* they're going to implement the given program. w/ @jyangballin @KLieret @18jeffreyma

2:41 PM · Jun 5, 2026 · 2.4K Views
Sentiment

Positive users highlight agents choosing languages in ProgramBench for whole-repository code generation as interesting because implementation freedom reveals more about capabilities.

Pos
100.0%
Neg
0.0%
1 comments with sentiment.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS2.4KBOOKMARKS1LIKES1
Ofir Press@OfirPress

@jyangballin @KLieret @18jeffreyma Full ProgramBench Q&A: https://youtube.com/watch?v=blxN5jYWe8U Benchmark at https://programbench.com

Ofir Press@OfirPress

ProgramBench is the first whole-repository-generation benchmark that also allows agents to pick *which* language they're going to use and *how* they're going to implement the given program. w/ @jyangballin @KLieret @18jeffreyma

19hViews 2.4KLikes 1Bookmarks 1
Jahanzaib Ahmed@jahanzaibai

@OfirPress @jyangballin @KLieret @18jeffreyma Letting agents pick the language is the interesting part. Implementation freedom probably reveals more about the agent's reasoning architecture than any fixed-language benchmark ever could.

17hViews 22Likes 1