1d ago

A 20-institution collaboration launches CHI-Bench, finding that current AI agents struggle with complex healthcare workflows

It tests agents across 75 real healthcare workflows.

0
Original post

interesting. They investigate agent performance on "long-horizon healthcare workflows" but the scientific question is more about reliability in a domain not yet covered by RLVR envs, with holistic rules, a volume of atomic skills, tools & complex interaction flow. Agents do badly

9:21 PM · May 27, 2026 View on X
Reposted by