From op-eds in newspapers to NeurIPS position papers, AI is increasingly shaping long-form public discourse. Its arguments seem plausible, but beneath surface fluency, we find argument collapse: different LLMs converge to the same main & supporting arguments and structure.
University of Maryland's Yekyung Kim finds LLMs suffer from "argument collapse," generating unique arguments just 3.4% of the time
Human writers produced unique arguments 65.3% of the time.
Most Activity
Using AI for persuasion writing seems to flattens arguments. What is an op-ed if not to take a unique and personal stance? Great work from my labmates @YekyungKim and @YapeiChang!!
From op-eds in newspapers to NeurIPS position papers, AI is increasingly shaping long-form public discourse. Its arguments seem plausible, but beneath surface fluency, we find argument collapse: different LLMs converge to the same main & supporting arguments and structure.
Different LLMs, when asked to write an essay on the same debate prompt, converge on the same main argument far more often than humans do, a phenomenon we call "argument collapse". On ~200 debate prompts, LLM essays make a unique main argument just 3% of the time, compared to 65% for human authors.
While each LLM essay might be totally reasonable on its own, as more and more of them spread through public discourse, they flatten the range of arguments that we read. Read more 👇
From op-eds in newspapers to NeurIPS position papers, AI is increasingly shaping long-form public discourse. Its arguments seem plausible, but beneath surface fluency, we find argument collapse: different LLMs converge to the same main & supporting arguments and structure.

AI essays can sound reasonable, but when viewed collectively, they flatten public discourse, making it much less representative of the diversity of human perspectives. We release the code, AI essays and features. Paper: https://arxiv.org/pdf/2606.01736 Data/Code: https://github.com/mungg/argument_collapse

Even when the central claim is similar, humans support it in more varied ways. Among essays with the same main arguments, 41.0% of supporting arguments extracted from human essays are unique. For LLMs, only 9.1% are.

Prior AI-writing research studies surface style. We go deeper by extracting & analyzing arguments. Across 195 debates, 65.3% of main arguments in human-authored essays are unique within a debate, versus 3.4% for essays generated by GPT, Claude, Gemini, DeepSeek, and Minimax.

In a debate on if Americans are too obsessed with cleanliness, all LLMs collapse to a hedged middle ground while humans either reject the debate’s premise or take a strong position. Asking LLMs explicitly for diverse answers recovers some human arguments, but many remain missing.

At the paragraph level, LLM essays follow a more formulaic structure. They often start with a direct thesis and spend more of the essay making explicit arguments, while human essays mix in more exposition.

Qualitatively, humans tend to use more specific and concrete sub-arguments, while LLMs more often reuse generic evidence, abstract reasoning, and hedged claims.
Excellent work !!!
From op-eds in newspapers to NeurIPS position papers, AI is increasingly shaping long-form public discourse. Its arguments seem plausible, but beneath surface fluency, we find argument collapse: different LLMs converge to the same main & supporting arguments and structure.

Work done with @YapeiChang, @chautmpham and @MohitIyyer. Thanks to @ClipUmd for all the support!