Nature Magazine shows that generalized models (eg: Gemini / GPT / Opus) beat best-in-class specialist models (eg: OpenEvidence) on medical benchmarks.
This is nothing new, and can be explained in three words: The bitter lesson.
Academic Kawin Ethayarajh disputed the 'bitter lesson' framing.
Nature Magazine shows that generalized models (eg: Gemini / GPT / Opus) beat best-in-class specialist models (eg: OpenEvidence) on medical benchmarks.
This is nothing new, and can be explained in three words: The bitter lesson.
Users praise general LLMs beating specialist medical AI on benchmarks as confirmation that the bitter lesson of broad methods outperforming narrow ones remains undefeated.
No Digg Deeper questions have been answered for this story yet.

Source: https://www.nature.com/articles/s41591-026-04431-5/figures/2
@gokulr This is not what the bitter lesson is ...
Nature Magazine shows that generalized models (eg: Gemini / GPT / Opus) beat best-in-class specialist models (eg: OpenEvidence) on medical benchmarks.
This is nothing new, and can be explained in three words: The bitter lesson.

@gokulr Bro, turns out the best medical expert is just the model that read everything, not the one trained to feel confident about one thing.

I think part of the reason is that many neural paths in generalized models lead to outcomes, whereas only specialized paths do. Both have their values, though. I do feel like theoretical science will learn to use generalized models in early phases and then move to specialized models once a direction has been established.

@gokulr specialists win on narrow tasks, but generalists eat the benchmarks when distribution shifts. bitter lesson, yeah.

@gokulr bitter lesson remains undefeated

@gokulr The bitter lesson again. Specialist workflow still matters, but the benchmark lead is going to the frontier models.