1d ago

A 20-institution collaboration launches CHI-Bench, finding that current AI agents struggle with complex healthcare workflows

It tests agents across 75 real healthcare workflows.

1264103.0K

——0——

Original post

#420Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@TEORTAXESTEX

interesting. They investigate agent performance on "long-horizon healthcare workflows" but the scientific question is more about reliability in a domain not yet covered by RLVR envs, with holistic rules, a volume of atomic skills, tools & complex interaction flow. Agents do badly

9:21 PM · May 27, 2026

Reposted by

#1085@SANMIKOYEJO