2h ago

LisanBench creator Lisan al Gaib says a three-agent Opus 4.8 team solves ProgramBench tasks 1.8 times faster than a single agent

The three-agent team reached a 60% hidden-test pass rate

0
Original post

this is the first time we see some good multi-agent evals on a launch

10:18 AM · May 28, 2026 View on X

@scaling01 very nice indeed

Lisan al GaibLisan al Gaib@scaling01

this is the first time we see some good multi-agent evals on a launch

5:18 PM · May 28, 2026 · 3.9K Views
6:06 PM · May 28, 2026 · 875 Views
LisanBench creator Lisan al Gaib says a three-agent Opus 4.8 team solves ProgramBench tasks 1.8 times faster than a single agent · Digg