There has ALREADY been a lot written about NYU @EvidenceOpen @UpToDate Expert AI study but wanted to give my perspective as what counts for an "expert" in human-computer interaction these days. Especially when I see Twitter debates about item response theory. 🤣
A 🧵⬇️
For medical information, general AI frontier models (Google, OpenAI, Anthropic) outperformed specialized @EvidenceOpen and @UpToDate as assessed by 12 US clinicians, randomized and blinded to which model and extensive testing/benchmarks. This was not anticipated. @NatureMedicine https://www.nature.com/articles/s41591-026-04431-5

