/AI1h ago

Ofir Press says preliminary ProgramBench benchmarks show Opus 4.8 has marginal improvements over Opus 4.7

Testing remains active to complete the full model evaluation

1300169

Comments

Original post

Lisan al Gaib@scaling01#980inAI

@OfirPress @jyangballin we need Gemini 3.5 Flash and Opus 4.8 scores 👉👈

@jyangballin Full ProgramBench Q&A: https://www.youtube.com/watch?v=blxN5jYWe8U

Full benchmark at https://programbench.com/

8:09 AM · Jun 1, 2026 · 153 Views

/AI1h ago

Testing remains active to complete the full model evaluation

--0--

Comments

Original post

Lisan al Gaib@scaling01#980inAI

@OfirPress @jyangballin we need Gemini 3.5 Flash and Opus 4.8 scores 👉👈

@jyangballin Full ProgramBench Q&A: https://www.youtube.com/watch?v=blxN5jYWe8U

Full benchmark at https://programbench.com/

8:09 AM · Jun 1, 2026 · 153 Views

Sentiment

Sentiment unavailable for this story.

Cluster Engagement

Sentiment

Sentiment unavailable for this story.

Cluster Engagement

Views

Comments

Reposts

Bookmarks

Expand data

Posts from X

Most Activity

VIEWS38LIKES1

@scaling01 @jyangballin We're on it. Opus 4.8 has been partially ran and it seems to be only very marginally better than 4.7

@OfirPress @jyangballin we need Gemini 3.5 Flash and Opus 4.8 scores 👉👈

1h3810