LisanBench creator Lisan al Gaib says a three-agent Opus 4.8 team solves ProgramBench tasks 1.8 times faster than a single agent
The three-agent team reached a 60% hidden-test pass rate
——0——
REPLY
#716elie@ELIEBAKOUCH
@scaling01 very nice indeed
this is the first time we see some good multi-agent evals on a launch
5:18 PM · May 28, 2026 · 3.9K Views
6:06 PM · May 28, 2026 · 875 Views