Just read in The Atlantic according to a study that "using AI did not significantly improve patients’ ability to diagnose themselves or others."
Went to go look up the models used in the study.
Hmmm. 🤔
The study evaluated models up to two years outdated
Just read in The Atlantic according to a study that "using AI did not significantly improve patients’ ability to diagnose themselves or others."
Went to go look up the models used in the study.
Hmmm. 🤔
Many users dismissed the study questioning AI medical diagnosis as flawed and embarrassing because it relied on outdated models like Llama 3 and 4o that they called worthless garbage.
No Digg Deeper questions have been answered for this story yet.

@lthlnkso @pmarca How is a source of their choice a control?

@YachtBrother Yeah, I don't fault the researchers. I'm sure it takes time to do the researchers, get the data, write your paper, get it reviewed, published, etc.
I think the problem is people who base current opinions off the capability of old models.

@lthlnkso We covered this when the study came out and reproduced the results with a newer stack of models.
This is exactly why healthcare AI needs independent, up-to-date evals. "AI" performance depends heavily on the model, benchmark, and clinical task.

@Austen @pmarca The control is like: "If you had these symptoms and were trying to figure out what disease you have, use whatever sources you would use at home (e.g. google)."
So, it's comparing using an LLM to people who are using whatever they would normally.

@lthlnkso to play devils advocate, the reason why they're using outdated models is that these models weren't outdated at the time of the study completion (studies take a while to go from research underway to published article)
I'm sure a 2nd look would fare better

@lthlnkso Always the same.
I look forward to the day when research like this is conducted with models that aren't 1-2 years out of date
Just read in The Atlantic according to a study that "using AI did not significantly improve patients’ ability to diagnose themselves or others."
Went to go look up the models used in the study.
Hmmm. 🤔

@lthlnkso It's unreal how this keep happening. My understanding is the excuse for this is that the studies take a while to become public or go thru review, so many were conducted when these models were still relevant.

@lthlnkso At this point it's every single research study that exists. It's so prevalent that I simply do not believe the conclusions of literally ANY study unless I have taken the time to set down and read through the results and methodology section, and agree with them.

@lthlnkso Except when these models came out people would expect that they would make it better.
The claims of each iteration are the same at the time they come out and yet retroactive studies keep finding stuff like this.
Not pooh poohing AI but the hype train gets old.

Disagree that science (or medicine) doesn't go off of vibes.
Towards the end of your article you reference a survey reporting that 81% of doctors say AI is used in their practice. Most of the surveyed doctors also report AI being useful. Are those doctors using Llama 3?
Probably not! Probably they are using the latest and greatest models, and that's probably based on vibes (and availability). Likely too for medical researchers using AI.
When you're actually doing science or medicine, you have to go off of vibes to some extent because there's not a relevant RCT to guide every choice.
Finally, since you are here, my impression reading your article is that it was kind of negative towards AI. AI is "worming" its way in, the FDA hasn't reviewed it, it could mislead patients, etc. In my opinion, your article is cast to cause the reader to worry about AI.
The choice to be negative is seemingly in conflict with most of your colleagues in the field who are overwhelmingly adopting AI and feel (mostly) positive about it and its applications as illustrated by the survey you briefly mentioned at the end of the article.
I'm curious if you agree with my characterization of the tone and why you made the choice to write that way if so.

@lthlnkso Is this a classic case of "why are you using those models given way better ones exist" that too much anti AI research falls under?

@lthlnkso This is from my story.
The Science paper I open with that showed AI outperforming physicians was using ChatGPT o1 from 2024. Is that also invalid?
This study was published just 4 months ago. The thing with science is I don't get to go off vibes.
Do another RCT!

@lotrlover23468 @lthlnkso Completley false.
But go on.

@lthlnkso Models two years old and up. Wow. They need to replace their journalists with GPT 5.5

@lthlnkso Subjectively finding a tool useful and objectively showing improved outcomes are different issues.
The story reports on an interesting press conference where a leading medical AI scientist publishing a major positive AI study spent much of the time cautioning against AI overuse.

@lthlnkso Next topic idea: How many deaths of children did Elon Musks USAID cuts cost? Leftists are claiming 68 million children and 187 million total.

@lthlnkso @Austen @pmarca Was this study before or after Google implemented AI Overviews?

@mount_tai70 I have been reading about that. I may post about it, but it is challenging as the countries that receive lots of USAID also tend to have bad systems for tracking and reporting deaths, so it is hard to know how many people are dying (and of what) there.

@YachtBrother @lthlnkso Llama 3 has literally NEVER been a leading model in any way. And even so, because these models are outdated and completely worthless, journalistic integrity would be to pull the article entirely.