/Tech2d ago

University of Maryland's Yekyung Kim finds LLMs suffer from "argument collapse," generating unique arguments just 3.4% of the time

Story Overview

New analysis from University of Maryland researchers shows large language models across multiple providers converge on nearly identical main arguments in long-form debate essays, achieving unique arguments only 3.4 percent of the time compared with 65.3 percent for human writers responding to the same New York Times prompts.

115864147448231.3K

#181

Original post

Tuhin Chakrabarty#1158

Yekyung Kim@YekyungKim

From op-eds in newspapers to NeurIPS position papers, AI is increasingly shaping long-form public discourse. Its arguments seem plausible, but beneath surface fluency, we find argument collapse: different LLMs converge to the same main & supporting arguments and structure.

8:01 AM · Jun 8, 2026 · 115.3K Views

/Tech2d ago

University of Maryland's Yekyung Kim finds LLMs suffer from "argument collapse," generating unique arguments just 3.4% of the time

Story Overview

115864147448231.3K

#181

Original post

Tuhin Chakrabarty#1158

Yekyung Kim@YekyungKim

8:01 AM · Jun 8, 2026 · 115.3K Views

Open Question

Diversity instructions only go so far

Even when researchers added explicit variety prompts or position guidance, models recovered just half of the distinct human arguments and sometimes produced outputs outside the observed human range.

Developer Impact

Public discourse could narrow if models dominate drafting

The study notes that repeated reliance on the same polished argument structures and hedged sub-points might shrink the variety of ideas reaching readers, though real-world editing and retrieval use remain untested.

Sentiment

Positive users praise the LLM argument collapse study for exposing how AI flattens discourse depth, while negative users accuse researchers of fabricating results to protect tuition revenue.

Pos

66.7%

Neg

33.3%

6 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

VIEWS43.9KBOOKMARKS120RETWEETS42

Mohit Iyyer@MohitIyyer

Different LLMs, when asked to write an essay on the same debate prompt, converge on the same main argument far more often than humans do, a phenomenon we call "argument collapse". On ~200 debate prompts, LLM essays make a unique main argument just 3% of the time, compared to 65% for human authors.

While each LLM essay might be totally reasonable on its own, as more and more of them spread through public discourse, they flatten the range of arguments that we read. Read more 👇

Yekyung Kim@YekyungKim

2d43.9K221120

LIKES230REPLIES33

Ethan Mollick@emollick

The Matrix idea of keeping humans as batteries is obviously weird... we would be more useful as dice.

LLMs default to very similar kinds of arguments & structure, and even different LLMs seem to collapse to similar concepts. Humans provide a lot more variation in their own work.

Yekyung Kim@YekyungKim

1d32.6K23082

Noah Smith 🐇🇺🇸🇺🇦🇹🇼@Noahpinion

Fortunately, AI is just Noah Smith, because it is trained on Noah Smith.

Thus, Noah Smith Thought will now conquer the world without me having to do anything 🥰

Yekyung Kim@YekyungKim

1d16.5K4111

Noah Smith 🐇🇺🇸🇺🇦🇹🇼@Noahpinion

Every LLM is just Noah Smith in computer form

Mohit Iyyer@MohitIyyer

While each LLM essay might be totally reasonable on its own, as more and more of them spread through public discourse, they flatten the range of arguments that we read. Read more 👇

1d17.6K454

Kaicker.bsky.social@zurtalab

This is becoming disturbingly evident across formerly respectable publications across South Asia

Yekyung Kim@YekyungKim

1d2.1K1211

Yekyung Kim@YekyungKim

2d115.3K295217

Jenna Russell@jennajrussell

Using AI for persuasion writing seems to flattens arguments. What is an op-ed if not to take a unique and personal stance? Great work from my labmates @YekyungKim and @YapeiChang!!

Yekyung Kim@YekyungKim

2d2.7K303

Honglin (虹霖) Bao@HonglinB

@YekyungKim interesting! we have fairly consistent findings in our preprint here and we find that post-training is one reason: https://arxiv.org/pdf/2605.27878 CC @profjamesevans

2d44593

Sreeram Kannan@sreeramkannan

I don’t think it’s intrinsic to llms.

LLMs can model many personalities in the ensemble. But without adequate directional context they will resort to convergent thinking (modeling the average users next token or feedback).

Right now humans can supply the directional context but when setup with their own evolutionary game (agents have their own money / property), the ones with non convergent thinking or prompts or soul-md will have better survival.

1d8663

Tuhin Chakrabarty@TuhinChakr

Excellent work !!!

Yekyung Kim@YekyungKim

2d94171

Kirk Patrick Miller@Chaos2Cured

No.

I disagree.

And why don’t you say the real reason you are making this crap up… Mr. Assistant professor.

Tuition money dropping, so lie about AI.

Shout “argument collapse” louder so when everyone finds out you’re full of it, it’s loud.

I am so tired of universities openly lying and misrepresenting information to gain public opinion. •

2d18110

Mohit Iyyer@MohitIyyer

@Patty_H93 We didn't evaluate the merits of each argument, but we did extract and analyze high-level characteristics of LLM arguments vs. human arguments. See the quoted tweet:

2d48421

Brian Cheong@briancheong

@MohitIyyer Argument collapse feels like the writing equivalent of mode collapse. The weird part is that it can look diverse at the sentence level while converging on the same thesis.

2d14421

Patty@Patty_H93

@MohitIyyer Are all arguments rated the same? Is it possible the LLMs argument is a better one?

2d410

Suresh@_Suresh2

@MohitIyyer bet the 3% changes with different system prompts

2d258

Mohit Iyyer@MohitIyyer

@_Suresh2 We try different prompts in the paper! You can indeed improve this number if you ask the model to generate N different essays for a given prompt, each with different main arguments. However, many of the LLM arguments in this setting are not ones that humans would make.

2d2374

Christian Moriarty@MoriartyCR

@YekyungKim @AnnaRMills This is going to be what I start pointing to when people say they "just use it got idea generation." To me this is exactly backwards. Your ideas are what make you human. Then if anything, just use the AI to fill in the grammar and check for flow

We're using these tools all wrong

1d222

Nick Dobos@NickADobos

@emollick Oh no

The AI’s will gamble on us for sport

You don’t need dice when you have human gladiators

Ethan Mollick@emollick

The Matrix idea of keeping humans as batteries is obviously weird... we would be more useful as dice.

LLMs default to very similar kinds of arguments & structure, and even different LLMs seem to collapse to similar concepts. Humans provide a lot more variation in their own work.

1d46930

stringking42069@stringking42069

@MohitIyyer That’s a great point and you’re right to push back on this flattening of discourse generated by llms. It’s not just the creeping sense of bland overly smoothed discussion points but the feeling that this also leads to a subtle creeping feeling of cognitive offloading.

1d1284

Yekyung Kim@YekyungKim

Even when the central claim is similar, humans support it in more varied ways. Among essays with the same main arguments, 41.0% of supporting arguments extracted from human essays are unique. For LLMs, only 9.1% are.

2d112