/Tech13h ago

Google DeepMind's Séb Krier and other critics argue an AI self-diagnosis study is flawed for using obsolete models

The study evaluated models up to two years outdated

568932461217.4K

#667

Original post

Quick Thoughts@lthlnkso

Just read in The Atlantic according to a study that "using AI did not significantly improve patients’ ability to diagnose themselves or others."

Went to go look up the models used in the study.

Hmmm. 🤔

2:16 PM · Jun 23, 2026 · 216.9K Views

Sentiment

Many users dismissed the study questioning AI medical diagnosis as flawed and embarrassing because it relied on outdated models like Llama 3 and 4o that they called worthless garbage.

Pos

0.0%

Neg

100.0%

10 comments with sentiment.

Cluster Engagement

Digg Deeper

No Digg Deeper questions have been answered for this story yet.

Posts from X

Most Activity

VIEWS5.2KBOOKMARKS2

Austen Allred@Austen

@lthlnkso @pmarca How is a source of their choice a control?

1d5.2K432

LIKES89REPLIES5

Quick Thoughts@lthlnkso

@YachtBrother Yeah, I don't fault the researchers. I'm sure it takes time to do the researchers, get the data, write your paper, get it reviewed, published, etc.

I think the problem is people who base current opinions off the capability of old models.

1d3.3K891

RETWEETS2

Medical Sphere@MedicalSphereAI

@lthlnkso We covered this when the study came out and reproduced the results with a newer stack of models.

This is exactly why healthcare AI needs independent, up-to-date evals. "AI" performance depends heavily on the model, benchmark, and clinical task.

23h32541

Quick Thoughts@lthlnkso

@Austen @pmarca The control is like: "If you had these symptoms and were trying to figure out what disease you have, use whatever sources you would use at home (e.g. google)."

So, it's comparing using an LLM to people who are using whatever they would normally.

1d3.8K71

YachtBrotherMD@YachtBrother

@lthlnkso to play devils advocate, the reason why they're using outdated models is that these models weren't outdated at the time of the study completion (studies take a while to go from research underway to published article)

I'm sure a 2nd look would fare better

1d3.9K52

Andrew Curran@AndrewCurran_

@lthlnkso Always the same.

1d1.4K40

xlr8harder@xlr8harder

I look forward to the day when research like this is conducted with models that aren't 1-2 years out of date

Quick Thoughts@lthlnkso

Just read in The Atlantic according to a study that "using AI did not significantly improve patients’ ability to diagnose themselves or others."

Went to go look up the models used in the study.

Hmmm. 🤔

5h1K311

Romlib 🎄@romlib_

@lthlnkso It's unreal how this keep happening. My understanding is the excuse for this is that the studies take a while to become public or go thru review, so many were conducted when these models were still relevant.

1d1.3K21

me, B.S.@normal_brandon1

@lthlnkso At this point it's every single research study that exists. It's so prevalent that I simply do not believe the conclusions of literally ANY study unless I have taken the time to set down and read through the results and methodology section, and agree with them.

1d1.1K41

Numenorean Blacksmith@lotrlover23468

@lthlnkso Except when these models came out people would expect that they would make it better.

The claims of each iteration are the same at the time they come out and yet retroactive studies keep finding stuff like this.

Not pooh poohing AI but the hype train gets old.

1d1.5K5

Quick Thoughts@lthlnkso

Disagree that science (or medicine) doesn't go off of vibes.

Towards the end of your article you reference a survey reporting that 81% of doctors say AI is used in their practice. Most of the surveyed doctors also report AI being useful. Are those doctors using Llama 3?

Probably not! Probably they are using the latest and greatest models, and that's probably based on vibes (and availability). Likely too for medical researchers using AI.

When you're actually doing science or medicine, you have to go off of vibes to some extent because there's not a relevant RCT to guide every choice.

Finally, since you are here, my impression reading your article is that it was kind of negative towards AI. AI is "worming" its way in, the FDA hasn't reviewed it, it could mislead patients, etc. In my opinion, your article is cast to cause the reader to worry about AI.

The choice to be negative is seemingly in conflict with most of your colleagues in the field who are overwhelmingly adopting AI and feel (mostly) positive about it and its applications as illustrated by the survey you briefly mentioned at the end of the article.

I'm curious if you agree with my characterization of the tone and why you made the choice to write that way if so.

10h3055

dfadsdaf@dfadsdaf94761

@lthlnkso Is this a classic case of "why are you using those models given way better ones exist" that too much anti AI research falls under?

1d9707

Benjamin Mazer@BenMazer

@lthlnkso This is from my story.

The Science paper I open with that showed AI outperforming physicians was using ChatGPT o1 from 2024. Is that also invalid?

This study was published just 4 months ago. The thing with science is I don't get to go off vibes.

Do another RCT!

11h4703

Basil Frankweiler@BasilFranken

@lotrlover23468 @lthlnkso Completley false.

But go on.

1d37

dmitriy@DmitriyLeybel

@lthlnkso Models two years old and up. Wow. They need to replace their journalists with GPT 5.5

1d6094

Benjamin Mazer@BenMazer

@lthlnkso Subjectively finding a tool useful and objectively showing improved outcomes are different issues.

The story reports on an interesting press conference where a leading medical AI scientist publishing a major positive AI study spent much of the time cautioning against AI overuse.

9h531

Mount Tai@mount_tai70

@lthlnkso Next topic idea: How many deaths of children did Elon Musks USAID cuts cost? Leftists are claiming 68 million children and 187 million total.

8h167

John Shearin@jshearin01

@lthlnkso @Austen @pmarca Was this study before or after Google implemented AI Overviews?

1d1634

Quick Thoughts@lthlnkso

@mount_tai70 I have been reading about that. I may post about it, but it is challenging as the countries that receive lots of USAID also tend to have bad systems for tracking and reporting deaths, so it is hard to know how many people are dying (and of what) there.

7h157

Nairebis - e/max-acc@Nairebis

@YachtBrother @lthlnkso Llama 3 has literally NEVER been a leading model in any way. And even so, because these models are outdated and completely worthless, journalistic integrity would be to pull the article entirely.

1d341