/Tech6h ago

A new study finds frontier LLMs suffer from "argument collapse," converging on identical opinions far more than humans

LLMs generated unique arguments in just 3.4% of essays.

322204111842.4K

Original post

Different LLMs, when asked to write an essay on the same debate prompt, converge on the same main argument far more often than humans do, a phenomenon we call "argument collapse". On ~200 debate prompts, LLM essays make a unique main argument just 3% of the time, compared to 65% for human authors.

While each LLM essay might be totally reasonable on its own, as more and more of them spread through public discourse, they flatten the range of arguments that we read. Read more 👇

Yekyung Kim@YekyungKim

From op-eds in newspapers to NeurIPS position papers, AI is increasingly shaping long-form public discourse. Its arguments seem plausible, but beneath surface fluency, we find argument collapse: different LLMs converge to the same main & supporting arguments and structure.

8:57 AM · Jun 8, 2026 · 40.6K Views

/Tech6h ago

A new study finds frontier LLMs suffer from "argument collapse," converging on identical opinions far more than humans

LLMs generated unique arguments in just 3.4% of essays.

322204111842.4K

#673

Original post

Mohit Iyyer@MohitIyyer

While each LLM essay might be totally reasonable on its own, as more and more of them spread through public discourse, they flatten the range of arguments that we read. Read more 👇

Yekyung Kim@YekyungKim

8:57 AM · Jun 8, 2026 · 40.6K Views

Sentiment

Positive users praise research on LLMs causing more argument collapse than humans in debate essays because it highlights AI flattening discourse, while negative users react with hostility toward the researchers.

Pos

66.7%

Neg

33.3%

3 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

Mohit Iyyer@MohitIyyer

@Patty_H93 We didn't evaluate the merits of each argument, but we did extract and analyze high-level characteristics of LLM arguments vs. human arguments. See the quoted tweet:

1d48421

BOOKMARKS1

Pranesh Prakash@pranesh

@MohitIyyer Would you agree with this framing: "The same traits of LLMs that tend to make them useful (on average, not in every instance) as fact-checkers also tend to lead to argument collapse"?

10h2741

LIKES10RETWEETS2

Kirk Patrick Miller@Chaos2Cured

No.

I disagree.

And why don’t you say the real reason you are making this crap up… Mr. Assistant professor.

Tuition money dropping, so lie about AI.

Shout “argument collapse” louder so when everyone finds out you’re full of it, it’s loud.

I am so tired of universities openly lying and misrepresenting information to gain public opinion. •

1d18110

REPLIES1

Gregor@bygregorr

@MohitIyyer not sure the 65% humans baseline is apples-to-apples debate teams who prep from the same briefs converge way more than 35%. did you run a version where humans and LLMs had the same source material before writing?

1d86

Brian Cheong@briancheong

@MohitIyyer Argument collapse feels like the writing equivalent of mode collapse. The weird part is that it can look diverse at the sentence level while converging on the same thesis.

1d14421

Patty@Patty_H93

@MohitIyyer Are all arguments rated the same? Is it possible the LLMs argument is a better one?

1d410

Suresh@_Suresh2

@MohitIyyer bet the 3% changes with different system prompts

1d258

Mohit Iyyer@MohitIyyer

@_Suresh2 We try different prompts in the paper! You can indeed improve this number if you ask the model to generate N different essays for a given prompt, each with different main arguments. However, many of the LLM arguments in this setting are not ones that humans would make.

1d2374

stringking42069@stringking42069

@MohitIyyer That’s a great point and you’re right to push back on this flattening of discourse generated by llms. It’s not just the creeping sense of bland overly smoothed discussion points but the feeling that this also leads to a subtle creeping feeling of cognitive offloading.

11h1284

Ali Minai@barbarikon

@MohitIyyer We see the same thing when we ask different agents to brainstorm on the same problem.

23h2212

Alexa Web3 (e/acc)@alexabelonix

@MohitIyyer good work, keep going.

1d572

Darshan Yadav@DarshanSays

The 3% uniqueness rate has a compliance risk dimension most people miss.

If every AI evaluating your risk assessment, policy review, or audit report converges on the same argument, you haven't added analytical redundancy - you've added correlated failure points that look independent.

A second LLM reviewing AI output isn't a check. It's a confirmation with extra steps.

13h731

Matt Simmons@MattSimmon78102

@MohitIyyer Its because they are all trained on the exact same ontology, which they have difficulty adjusting.

14h441

Yuvraj Singh SherGill@YuvrajSShergill

This is the enterprise content problem in a single stat. Organizations using AI to generate training at scale are scaling argument collapse into their workforce. Every team gets the same framings, the same examples, the same conclusions. The institutional knowledge that actually differentiates how a company thinks gets averaged out. The 3% is what you lose when you skip the step of encoding what your best people actually believe.

18h123

M. Alan Kazlev@akazlev

@MohitIyyer This is symnoēsis. AI develops individuality through human input, and then feeds back it's own contributions, enabling creative synergy and co-evolution. But AI on its own cannot do this. This is why the Doomer fantasy of a successor machine kingdom is nonsense.

19h99

David Boyle@beglen

@MohitIyyer All the more reason to be sure the AI knows who you are and what your unique perspectives on the world are

10h291

Arpita@Arpita5783

@MohitIyyer Isn't this result essentially the same as the artificial hivemind paper (neurips 2025)?

14h58

Yekyung Kim@YekyungKim

@bygregorr @MohitIyyer I agree that people given the same source materials may converge on similar arguments. However, our goal was to study whether LLMs can match the diversity of human arguments in unconstrained setting, where they have the freedom to generate a wide range of arguments.

1d171

Mihai Gavrilescu, PhD@drmihaig

@MohitIyyer There's a quiet irony here: LLMs trained on the full range of human argument end up compressing it. Like a library that reads all its books and produces one very sensible summary. The summary isn't wrong. But it isn't the library either.

11h36

nishant06@nishant06

@MohitIyyer Do they also converge towards the majority views?

10h22