/AI1h ago

Ofir Press says preliminary ProgramBench benchmarks show Opus 4.8 has marginal improvements over Opus 4.7

Testing remains active to complete the full model evaluation

--0--
Original post
Lisan al Gaib@scaling01#980inAI

@OfirPress @jyangballin we need Gemini 3.5 Flash and Opus 4.8 scores 馃憠馃憟

Ofir Press@OfirPress

@jyangballin Full ProgramBench Q&A: https://www.youtube.com/watch?v=blxN5jYWe8U

Full benchmark at https://programbench.com/

8:09 AM 路 Jun 1, 2026 路 153 Views
Sentiment
Sentiment unavailable for this story.
Cluster Engagement
-
Views
-
Comments
-
Reposts
-
Bookmarks
Expand data
Posts from X
Most Activity
Most ActivityTimeline
VIEWS38LIKES1
Ofir Press@OfirPress

@scaling01 @jyangballin We're on it. Opus 4.8 has been partially ran and it seems to be only very marginally better than 4.7

Lisan al Gaib@scaling01

@OfirPress @jyangballin we need Gemini 3.5 Flash and Opus 4.8 scores 馃憠馃憟

1hViews 38Likes 1Bookmarks 0