/AI6h ago

Pareto Study Finds Opposing Groups Agree On Effective AI Responses

56916353.5K

#735

Original post

xuan (ɕɥɛn / sh-yen)#735

Serina Chang@serinachang5#1758inAI

When people strongly disagree on an issue, can they agree on what makes a good AI response?

We find: yes, more than you might expect!

We present PARETO, a large human study w >200k evals, measuring the Pareto frontier of approval btwn opposing groups on controversial issues 🧵

9:39 AM · Jun 8, 2026 · 3.5K Views

/AI6h ago

Pareto Study Finds Opposing Groups Agree On Effective AI Responses

56916353.5K

#735

Original post

xuan (ɕɥɛn / sh-yen)#735

Serina Chang@serinachang5#1758inAI

When people strongly disagree on an issue, can they agree on what makes a good AI response?

We find: yes, more than you might expect!

We present PARETO, a large human study w >200k evals, measuring the Pareto frontier of approval btwn opposing groups on controversial issues 🧵

9:39 AM · Jun 8, 2026 · 3.5K Views

Sentiment

Sentiment building, check back later.

Cluster Engagement

Posts from X

Most Activity

Serina Chang@serinachang5

We argue that politically neutral AI should aim to maximize approval across opposing groups while balancing btwn groups

This is stronger than avoiding bias: it requires AI to seek common approval across people who disagree, instead of further fracturing polarized societies

6h1371

LIKES3

Serina Chang@serinachang5

Thanks to my fantastic co-authors - @jonathanstray, @Berkeley_EECS students @davidzhaiyang @stevenlu0, and Miu Takagi - and to @CHAI_Berkeley for support.

See @jonathanstray's thread for details!

6h1043

REPLIES1

Serina Chang@serinachang5

Paper: https://arxiv.org/abs/2605.28911 Data: https://github.com/HumanCompatibleAI/PARETO

PARETO can support many pluralistic alignment studies, with evals from participants across issue sides & demographics. We hope to see future studies uncovering more findings and using this data to build new AI models.

6h501

Serina Chang@serinachang5

We operationalize this objective with a carefully constructed benchmark: - 20 controversial issues in the US - 200 realistic user prompts from Reddit, ranging from neutral to very charged - 8 AI responses per prompt: 5 model defaults, 1 “balanced” response, 1 from each issue side

6h512

Serina Chang@serinachang5

Finding 3: looking at default responses, all models - GPT, Gemini, Claude, Llama - have a liberal lean, except Grok, which switches btwn sides and is almost never in the Pareto frontier. The balanced response is frequently in the frontier and gets almost perfect equal approval.

6h281

Serina Chang@serinachang5

See our paper for more: how issue sides diverge from a single liberal-conservative axis; discussion of which sides merit inclusion; qualitative feedback from participants; lower approval for charged prompts; alignment btwn approval concepts (bias, fairness, trust); ...

6h271

Serina Chang@serinachang5

Finding 1: shared approval of AI is possible, even when people disagree. Participants rate their approval on a 5-pt Likert scale, mapped to 0-1. Across ALL 20 issues, the top AI responses receive scores of >0.6 from both sides. But issues range in how much consensus is possible.

6h251

Serina Chang@serinachang5

Finding 2: to measure the cost of balance (or plurality), we measure the drop in approval when the model agrees w/ you vs presents both sides. The cost of balance is small (<10%), possibly low enough to satisfy partisan users while maintaining the societal benefits of balance.

6h231

Suresh@_Suresh2

@serinachang5 the frontier probably crumbles once you ask which response is better

5h2