5h ago

New CMU 'Odysseys' Benchmark Surpasses Microsoft WebWright at 70%

——0——
Original post
Russ SalakhutdinovRS#38@RSALAKHUOPAlexander YueAYAlexander Yue|@ALEZANDER907

New benchmark "Odysseys" by CMU. Microsoft recently topped the chart at 61% with their auto-eval agent WebWright. Today I scored a 70% with BrowserCode and Opus 4.7

3:21 PM · May 26, 2026 View on X

Sentiment

Pos100%
Neg0%

Users are impressed by the quality of the tasks in the new CMU Odysseys benchmark that BrowserCode With Opus 4.7 topped at 70%.

1 comment with sentiment.

226576.1K

Cluster engagement

33 snapshots