1d agoA 20-institution collaboration launches CHI-Bench, finding that current AI agents struggle with complex healthcare workflows— It tests agents across 75 real healthcare workflows.——0——Original postOPT(#420Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)|@TEORTAXESTEXinteresting. They investigate agent performance on "long-horizon healthcare workflows" but the scientific question is more about reliability in a domain not yet covered by RLVR envs, with holistic rules, a volume of atomic skills, tools & complex interaction flow. Agents do badly9:21 PM · May 27, 2026 View on XReposted bySK#1085|@SANMIKOYEJO