/AI2h ago

University of Maryland's Yekyung Kim finds LLMs suffer from "argument collapse," generating unique arguments just 3.4% of the time

Human writers produced unique arguments 65.3% of the time.

31032394

#366

Original post

Tuhin Chakrabarty#1053

Yekyung Kim@YekyungKim

From op-eds in newspapers to NeurIPS position papers, AI is increasingly shaping long-form public discourse. Its arguments seem plausible, but beneath surface fluency, we find argument collapse: different LLMs converge to the same main & supporting arguments and structure.

8:01 AM · Jun 8, 2026 · 3.1K Views

/AI2h ago

University of Maryland's Yekyung Kim finds LLMs suffer from "argument collapse," generating unique arguments just 3.4% of the time

Human writers produced unique arguments 65.3% of the time.

31032394

#366

Original post

Tuhin Chakrabarty#1053

Yekyung Kim@YekyungKim

8:01 AM · Jun 8, 2026 · 3.1K Views

Sentiment

Sentiment building, check back later.

Cluster Engagement

Posts from X

Most Activity

VIEWS1.3K

Jenna Russell@jennajrussell

Using AI for persuasion writing seems to flattens arguments. What is an op-ed if not to take a unique and personal stance? Great work from my labmates @YekyungKim and @YapeiChang!!

Yekyung Kim@YekyungKim

1h1.3K142

BOOKMARKS8LIKES19RETWEETS9

Mohit Iyyer@MohitIyyer

Different LLMs, when asked to write an essay on the same debate prompt, converge on the same main argument far more often than humans do, a phenomenon we call "argument collapse". On ~200 debate prompts, LLM essays make a unique main argument just 3% of the time, compared to 65% for human authors.

While each LLM essay might be totally reasonable on its own, as more and more of them spread through public discourse, they flatten the range of arguments that we read. Read more 👇

Yekyung Kim@YekyungKim

1h1.2K198

REPLIES1

Yekyung Kim@YekyungKim

AI essays can sound reasonable, but when viewed collectively, they flatten public discourse, making it much less representative of the diversity of human perspectives. We release the code, AI essays and features. Paper: https://arxiv.org/pdf/2606.01736 Data/Code: https://github.com/mungg/argument_collapse

2h201

Yekyung Kim@YekyungKim

Even when the central claim is similar, humans support it in more varied ways. Among essays with the same main arguments, 41.0% of supporting arguments extracted from human essays are unique. For LLMs, only 9.1% are.

2h112

Yekyung Kim@YekyungKim

Prior AI-writing research studies surface style. We go deeper by extracting & analyzing arguments. Across 195 debates, 65.3% of main arguments in human-authored essays are unique within a debate, versus 3.4% for essays generated by GPT, Claude, Gemini, DeepSeek, and Minimax.

2h271

Yekyung Kim@YekyungKim

In a debate on if Americans are too obsessed with cleanliness, all LLMs collapse to a hedged middle ground while humans either reject the debate’s premise or take a strong position. Asking LLMs explicitly for diverse answers recovers some human arguments, but many remain missing.

2h151

Yekyung Kim@YekyungKim

At the paragraph level, LLM essays follow a more formulaic structure. They often start with a direct thesis and spend more of the essay making explicit arguments, while human essays mix in more exposition.

2h141

Yekyung Kim@YekyungKim

Qualitatively, humans tend to use more specific and concrete sub-arguments, while LLMs more often reuse generic evidence, abstract reasoning, and hedged claims.

2h111

Tuhin Chakrabarty@TuhinChakr

Excellent work !!!

Yekyung Kim@YekyungKim

1h19220

Yekyung Kim@YekyungKim

Work done with @YapeiChang, @chautmpham and @MohitIyyer. Thanks to @ClipUmd for all the support!

2h142