This Yale + University of Chicago paper shows that real gap between LLM generated research ideas vs humans is not idea quality, but idea range: LLMs think narrower than human researchers.
The researchers built a controlled test from 11,683 real papers, using each paper’s nearby prior work as the shared starting point.
They asked models to propose a new motivation and method from those same prior papers, then compared those ideas with the real human paper ideas.
Instead of asking whether 1 idea looked novel, they labeled each idea by what gap it noticed and what kind of contribution it made.
Human ideas spread across many patterns, such as explaining mechanisms, testing failures, measuring evidence, building systems, and improving efficiency.
Only 12.1% of human ideas were mainly about connecting separate work, but 47.1% to 64.2% of LLM ideas did that, meaning models used this move about 4 to 5 times more often.
Even extra reasoning made this pattern stronger, suggesting models often polish a familiar recipe instead of finding more varied research moves.
---
– arxiv. org/abs/2607.01233
Title: "Measuring the Gap Between Human and LLM Research Ideas"







