From op-eds in newspapers to NeurIPS position papers, AI is increasingly shaping long-form public discourse. Its arguments seem plausible, but beneath surface fluency, we find argument collapse: different LLMs converge to the same main & supporting arguments and structure.
University of Maryland's Yekyung Kim finds LLMs suffer from "argument collapse," generating unique arguments just 3.4% of the time
Story Overview
New analysis from University of Maryland researchers shows large language models across multiple providers converge on nearly identical main arguments in long-form debate essays, achieving unique arguments only 3.4 percent of the time compared with 65.3 percent for human writers responding to the same New York Times prompts.
Diversity instructions only go so far
Even when researchers added explicit variety prompts or position guidance, models recovered just half of the distinct human arguments and sometimes produced outputs outside the observed human range.
Public discourse could narrow if models dominate drafting
The study notes that repeated reliance on the same polished argument structures and hedged sub-points might shrink the variety of ideas reaching readers, though real-world editing and retrieval use remain untested.
Positive users praise the LLM argument collapse study for exposing how AI flattens discourse depth, while negative users accuse researchers of fabricating results to protect tuition revenue.
Most Activity
Different LLMs, when asked to write an essay on the same debate prompt, converge on the same main argument far more often than humans do, a phenomenon we call "argument collapse". On ~200 debate prompts, LLM essays make a unique main argument just 3% of the time, compared to 65% for human authors.
While each LLM essay might be totally reasonable on its own, as more and more of them spread through public discourse, they flatten the range of arguments that we read. Read more 👇
From op-eds in newspapers to NeurIPS position papers, AI is increasingly shaping long-form public discourse. Its arguments seem plausible, but beneath surface fluency, we find argument collapse: different LLMs converge to the same main & supporting arguments and structure.
The Matrix idea of keeping humans as batteries is obviously weird... we would be more useful as dice.
LLMs default to very similar kinds of arguments & structure, and even different LLMs seem to collapse to similar concepts. Humans provide a lot more variation in their own work.
From op-eds in newspapers to NeurIPS position papers, AI is increasingly shaping long-form public discourse. Its arguments seem plausible, but beneath surface fluency, we find argument collapse: different LLMs converge to the same main & supporting arguments and structure.
Fortunately, AI is just Noah Smith, because it is trained on Noah Smith.
Thus, Noah Smith Thought will now conquer the world without me having to do anything 🥰
From op-eds in newspapers to NeurIPS position papers, AI is increasingly shaping long-form public discourse. Its arguments seem plausible, but beneath surface fluency, we find argument collapse: different LLMs converge to the same main & supporting arguments and structure.
Every LLM is just Noah Smith in computer form
Different LLMs, when asked to write an essay on the same debate prompt, converge on the same main argument far more often than humans do, a phenomenon we call "argument collapse". On ~200 debate prompts, LLM essays make a unique main argument just 3% of the time, compared to 65% for human authors.
While each LLM essay might be totally reasonable on its own, as more and more of them spread through public discourse, they flatten the range of arguments that we read. Read more 👇
This is becoming disturbingly evident across formerly respectable publications across South Asia
From op-eds in newspapers to NeurIPS position papers, AI is increasingly shaping long-form public discourse. Its arguments seem plausible, but beneath surface fluency, we find argument collapse: different LLMs converge to the same main & supporting arguments and structure.
From op-eds in newspapers to NeurIPS position papers, AI is increasingly shaping long-form public discourse. Its arguments seem plausible, but beneath surface fluency, we find argument collapse: different LLMs converge to the same main & supporting arguments and structure.
Using AI for persuasion writing seems to flattens arguments. What is an op-ed if not to take a unique and personal stance? Great work from my labmates @YekyungKim and @YapeiChang!!
From op-eds in newspapers to NeurIPS position papers, AI is increasingly shaping long-form public discourse. Its arguments seem plausible, but beneath surface fluency, we find argument collapse: different LLMs converge to the same main & supporting arguments and structure.

@YekyungKim interesting! we have fairly consistent findings in our preprint here and we find that post-training is one reason: https://arxiv.org/pdf/2605.27878 CC @profjamesevans

I don’t think it’s intrinsic to llms.
LLMs can model many personalities in the ensemble. But without adequate directional context they will resort to convergent thinking (modeling the average users next token or feedback).
Right now humans can supply the directional context but when setup with their own evolutionary game (agents have their own money / property), the ones with non convergent thinking or prompts or soul-md will have better survival.
Excellent work !!!
From op-eds in newspapers to NeurIPS position papers, AI is increasingly shaping long-form public discourse. Its arguments seem plausible, but beneath surface fluency, we find argument collapse: different LLMs converge to the same main & supporting arguments and structure.

No.
I disagree.
And why don’t you say the real reason you are making this crap up… Mr. Assistant professor.
Tuition money dropping, so lie about AI.
Shout “argument collapse” louder so when everyone finds out you’re full of it, it’s loud.
I am so tired of universities openly lying and misrepresenting information to gain public opinion. •

@Patty_H93 We didn't evaluate the merits of each argument, but we did extract and analyze high-level characteristics of LLM arguments vs. human arguments. See the quoted tweet:

@MohitIyyer Argument collapse feels like the writing equivalent of mode collapse. The weird part is that it can look diverse at the sentence level while converging on the same thesis.

@MohitIyyer Are all arguments rated the same? Is it possible the LLMs argument is a better one?

@MohitIyyer bet the 3% changes with different system prompts

@_Suresh2 We try different prompts in the paper! You can indeed improve this number if you ask the model to generate N different essays for a given prompt, each with different main arguments. However, many of the LLM arguments in this setting are not ones that humans would make.

@YekyungKim @AnnaRMills This is going to be what I start pointing to when people say they "just use it got idea generation." To me this is exactly backwards. Your ideas are what make you human. Then if anything, just use the AI to fill in the grammar and check for flow
We're using these tools all wrong
@emollick Oh no
The AI’s will gamble on us for sport
You don’t need dice when you have human gladiators
The Matrix idea of keeping humans as batteries is obviously weird... we would be more useful as dice.
LLMs default to very similar kinds of arguments & structure, and even different LLMs seem to collapse to similar concepts. Humans provide a lot more variation in their own work.

@MohitIyyer That’s a great point and you’re right to push back on this flattening of discourse generated by llms. It’s not just the creeping sense of bland overly smoothed discussion points but the feeling that this also leads to a subtle creeping feeling of cognitive offloading.

Even when the central claim is similar, humans support it in more varied ways. Among essays with the same main arguments, 41.0% of supporting arguments extracted from human essays are unique. For LLMs, only 9.1% are.