SWE-bench creator Ofir Press outlines a cyclical framework where benchmark development drives language model progress
The cycle maps how benchmarks guide model pretraining and scaffolding
3432123.6K
Sentiment
Sentiment building, check back later.
Cluster Engagement
Digg Deeper
No Digg Deeper questions have been answered for this story yet.
Posts from X
Most Activity
Most Activity
VIEWS832BOOKMARKS1LIKES8
Peter Henderson@PeterHndrsn
This is the "outer" reinforcement learning loop.
Ofir Press@OfirPress
slide from my current talk:
4hViews 832Likes 8Bookmarks 1
Andrew M. Dai@AndrewDai
@OfirPress I think you're missing one more step before new evals are created: benchmaxxing.
Ofir Press@OfirPress
slide from my current talk:
3hViews 274Likes 4Bookmarks 1