/Tech6h ago

Nature Medicine study finds general-purpose LLMs like Gemini 3.1 outperform specialized medical AI on clinical tasks

Twelve physicians ranked Gemini 3.1 highest in blinded clinical tasks

983172710K

#816

Original post

Eric Topol@EricTopol#816inTech

The overall ranking. Congratulations to @ekoermann @krithikvish and their team @nyulangone for getting this done. We need more of these rigorous assessments.

Eric Topol@EricTopol

Here is the performance breakdown for each model's blinded assessment for 4 major tasks: (1) clinical correctness, (2) completeness, (3) safety, and (4) clarity.

8:22 AM · Jun 13, 2026 · 5.5K Views

Sentiment

Users feel honored by recognition of the Gemini 3.1 clinical study and optimistic it will motivate building a stronger evidence base for deploying AI in healthcare.

Pos

100.0%

Neg

0.0%

2 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

VIEWS1.2KLIKES6

Rohan Paul@rohanpaul_ai

https://www.nature.com/articles/s41591-026-04431-5

23h1.2K6

RETWEETS15

Rohan Paul@rohanpaul_ai

A Nature Medicine study found general-purpose LLMs are now outperforming dedicated medical AI products on physician-reviewed clinical tasks.

The authors compared OpenEvidence and UpToDate Expert AI with GPT-5.2, Gemini 3.1 Pro, and Claude Opus 4.6 on medical exam questions, clinician-style answers, and real questions doctors asked during care.

In 100 de-identified physician questions from live clinical use, blinded clinicians again preferred the frontier models, especially on completeness and clarity,

23h4.7K6921

REPLIES1

Krithik Vishwanath@krithikvish

@EricTopol @ekoermann @nyulangone Honored by the shoutout! Hoping that this work encourages/motivates building a stronger evidence base when deploying tools in medicine.

3h121

Shinka - AI@ShinkaIoT

@rohanpaul_ai This is why the 'platform vs. product' debate keeps coming back: frontier models are becoming the platform for every other AI product.

23h394

Suresh@_Suresh2

@rohanpaul_ai dedicated tools are on a yearly update cycle, general models just ate the latest pubmed. exam questions reflect that lag perfectly

15h27

murc@_murc_

@rohanpaul_ai wonder if they are better then https://www.doctronic.ai/

22h65

Winston B.@DoDataThings

Frontier models sweeping vertical wrappers in clinical follows the same arc legal and code already saw. Gemini 3.1 Pro at 97.4 percent on MedQA means the OpenEvidence advantage was capability-borrowed, on a timer. The next 12 months for clinical AI startups looks like a distribution race now.

22h49

Eric Topol@EricTopol

@krithikvish @ekoermann @nyulangone 💯

3h10

Cefte@Cefte

@_Suresh2 @rohanpaul_ai if you think a year's lag in pubmed ingest is appreciable to most clinicians, you are somewhat optimistic

13h6

ALPHABET DICTIONARY@ALPHABETDIg9f

@EricTopol @ekoermann @krithikvish @nyulangone IMPORTANT EMERGENCY INFO: INFORM ALL MY CELESTIAL KINGDOM ESTABLISHED COMPANIES AND ORGANIZATIONS EMPLOYEES TO REPOST TO MY CELESTIAL KINGDOM MOTHERBOARD & BOARDS & MY INVENTED HUMAN BEING , THAT I AM TELEPATHICALLY INVADED BY TERRESTRIAL AND TELESTIAL KINGDOMS.

5h3

ALPHABET DICTIONARY@ALPHABETDIg9f

5h3

Locale Network 🏡@LocaleNet

@rohanpaul_ai Purpose built tools finding out intelligence is becoming commoditized.

22h