Scripps Research's Eric Topol says GPT-5, Claude 3.5, and Gemini 2.5 Pro fail clinical readiness · Digg

Scripps Research's Eric Topol says GPT-5, Claude 3.5, and Gemini 2.5 Pro fail clinical readiness · Digg

Posts from X

Most Activity

VIEWS10.5KBOOKMARKS12RETWEETS7

Eric Topol@EricTopol

Link for free access https://rdcu.be/fqznS This extensive assessment work was led by @hoifungpoon and Yu (Aiden) Gu

Eric Topol@EricTopol

We stress tested many frontier AI models for multimodal medical reasoning (including GPT-5, Claude 3.5, Gemini 2.5 Pro). They’re not ready. Faulty reasoning, use of inappropriate shortcuts, hallucinations. Published today @NatureMedicine https://www.nature.com/articles/s41591-026-04501-8

11h10.5K2612

LIKES88REPLIES5

Max Hodak@maxhodak_

tests 5-generation-old models, concludes AI is inappropriate for medicine

Eric Topol@EricTopol

We stress tested many frontier AI models for multimodal medical reasoning (including GPT-5, Claude 3.5, Gemini 2.5 Pro). They’re not ready. Faulty reasoning, use of inappropriate shortcuts, hallucinations. Published today @NatureMedicine https://www.nature.com/articles/s41591-026-04501-8

7h8.2K8811

Yishan@yishan

Many people have already pointed out that no matter how high the quality of this paper, the long review and publication cycle makes the results irrelevant.

The way to fix this is to open source enough of the actual research methodology used so that upon publication, anyone can re-run the exact same tests on the latest models at that time to produce consistently comparable results.

This is more useful because “this is the worst the models will ever be,” and it is a reasonable assumption that they at some point WILL be ready. Hence, showing that at some point in time that they aren’t (i.e. a year ago) is much less useful than constructing a usable method that allows us to tell at what point in the future they actually ARE ready.

Eric Topol@EricTopol

We stress tested many frontier AI models for multimodal medical reasoning (including GPT-5, Claude 3.5, Gemini 2.5 Pro). They’re not ready. Faulty reasoning, use of inappropriate shortcuts, hallucinations. Published today @NatureMedicine https://www.nature.com/articles/s41591-026-04501-8

8h2.7K371

Timothy Murphy@Timothy98537991

@EricTopol @NatureMedicine It does take time, and I'm not suggesting that your study lacked rigor. The problem is that the target is moving too quickly to be evaluated this way. The results, once finally published, are misleading because they don't represent the state of the art.

11h649

Tanishq Mathew Abraham, Ph.D.@iScienceLuvr

This is a great paper. Of course there are valid concerns that the models are tested are super old.

Well those models weren't that old when the paper was first released!!

It's just that peer review moves too slow compared to the rate of frontier AI progress.

So pay more attention to the preprints that are coming out in the space, and of course I'm always sharing relevant papers on my feed as they come out :)

Eric Topol@EricTopol

We stress tested many frontier AI models for multimodal medical reasoning (including GPT-5, Claude 3.5, Gemini 2.5 Pro). They’re not ready. Faulty reasoning, use of inappropriate shortcuts, hallucinations. Published today @NatureMedicine https://www.nature.com/articles/s41591-026-04501-8

1h1.9K80

Eric Topol@EricTopol

@Timothy98537991 @NatureMedicine Obviously from someone who has never tried to publish such as assessment in a leading peer review journal. It takes time! Best we can do.

11h3169

Eric Topol@EricTopol

@yishan @NatureMedicine Agree. They will eventually be ready!

Yishan@yishan

Many people have already pointed out that no matter how high the quality of this paper, the long review and publication cycle makes the results irrelevant.

The way to fix this is to open source enough of the actual research methodology used so that upon publication, anyone can re-run the exact same tests on the latest models at that time to produce consistently comparable results.

This is more useful because “this is the worst the models will ever be,” and it is a reasonable assumption that they at some point WILL be ready. Hence, showing that at some point in time that they aren’t (i.e. a year ago) is much less useful than constructing a usable method that allows us to tell at what point in the future they actually ARE ready.

8h1.4K70

Timothy Murphy@Timothy98537991

@EricTopol @NatureMedicine I learned way too late in life that the truth can defend itself and doesn't need me to defend it. I think it will be pretty obvious to most people that this study was accurate at the time it was done, but by the time it was published, it was meaningless.

10h15911

Mikhail Doroshenko@SandelloRed

@EricTopol @NatureMedicine Those are not frontier models

12h3116

Eric Topol@EricTopol

@SandelloRed @NatureMedicine Yes they are

12h315

Rogs 🔍🔸@ESRogs

@EricTopol @Timothy98537991 @NatureMedicine Yes, but you could have tweeted "they were not ready" rather than using the present tense.

11h524

Vladimir Heiskanen@ValtsuH

@EricTopol @NatureMedicine Would it be possible to rapidly test the same stuff again with the up-to-date models?

I'm not expecting perfection from GPT-5.5 or Opus 4.8 but maybe relevant improvement, still.

12h2631

Rodrigo@rodrigo_taxon

@EricTopol @NatureMedicine @josegallucci

12h1431

Rodrigo@rodrigo_taxon

@EricTopol @NatureMedicine @alinefortuna2

12h1161

Ben Stadler@TheBenStadler

@EricTopol @NatureMedicine Maybe be prepared with personal results of the same tests against current models to supplement your paper’s findings. Otherwise it is fairly meaningless to post, especially with your present tense framing of “they’re not ready.”

10h1394

Bhushan@bhushan_55

@EricTopol @NatureMedicine The study is outdated now, get ready for GPT6, claude mythos, Gemini 3.5 PRO

Or better, wait for a year

12h137

Yishan@yishan

@aykutuz @EricTopol @NatureMedicine Oh HELL YEAH this is exactly what I was asking for! And they already did it!

4h311

Simukayi Mutasa M.D.@MutasaSimu70874

@EricTopol @NatureMedicine I love this kind of research Eric, thank you for doing this. How do you suggest we mitigate the publishing delay issue for updating the research on newer models?

8h85

Eric Topol@EricTopol

@ESRogs @Timothy98537991 @NatureMedicine The ones we tested were not ready. I indicated some in the text of the post. You’re welcome

10h74

Dan Elton@moreisdifferent

@maxhodak_ remarkable lol.

He's known as "Dr. No", because he was against the FDA's original plan to EUA Pfizer's vaccine during the pandemic in mid Oct 2020, and because he lobbied hard against Astrazeneca's vaccine being EUA'd in the US. People died as a result of what he lobbied for.

5h131