/AI10h ago

Replit president Michele Catasta launches ViBench, an open-source benchmark evaluating AI agents on end-to-end web application development

Opus 4.8 led the leaderboard with an 87.8% score.

--0--
Original posts
Quote posts
Reposts
Original post
Michele Catasta@pirroh#1818inAI

Most AI coding benchmarks miss what actually matters: how models perform at the application layer.

Introducing ViBench, an open-source benchmark for evaluating agents on end-to-end web application development.

10:50 AM · Jun 2, 2026 · 19.5K Views
Sentiment
Sentiment building, check back later.
Cluster Engagement
-
Views
-
Comments
-
Reposts
-
Bookmarks
Expand data
Posts from X
Most Activity
Most ActivityTimeline
VIEWS8KBOOKMARKS12LIKES50RETWEETS4REPLIES5

SWE benchmarks don’t necessarily capture app building capabilities. ViBench does.

Most AI coding benchmarks miss what actually matters: how models perform at the application layer.

Introducing ViBench, an open-source benchmark for evaluating agents on end-to-end web application development.

9hViews 8KLikes 50Bookmarks 12