/AI6h ago

Pareto Study Finds Opposing Groups Agree On Effective AI Responses

56916353.5K
Serina Chang@serinachang5#1758inAI

When people strongly disagree on an issue, can they agree on what makes a good AI response?

We find: yes, more than you might expect!

We present PARETO, a large human study w >200k evals, measuring the Pareto frontier of approval btwn opposing groups on controversial issues 🧵

9:39 AM · Jun 8, 2026 · 3.5K Views
Sentiment
Sentiment building, check back later.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS137
Serina Chang@serinachang5

We argue that politically neutral AI should aim to maximize approval across opposing groups while balancing btwn groups

This is stronger than avoiding bias: it requires AI to seek common approval across people who disagree, instead of further fracturing polarized societies

6hViews 137Likes 1
LIKES3
Serina Chang@serinachang5

Thanks to my fantastic co-authors - @jonathanstray, @Berkeley_EECS students @davidzhaiyang @stevenlu0, and Miu Takagi - and to @CHAI_Berkeley for support.

See @jonathanstray's thread for details!

6hViews 104Likes 3
REPLIES1
Serina Chang@serinachang5

Paper: https://arxiv.org/abs/2605.28911 Data: https://github.com/HumanCompatibleAI/PARETO

PARETO can support many pluralistic alignment studies, with evals from participants across issue sides & demographics. We hope to see future studies uncovering more findings and using this data to build new AI models.

6hViews 50Likes 1
Serina Chang@serinachang5

We operationalize this objective with a carefully constructed benchmark: - 20 controversial issues in the US - 200 realistic user prompts from Reddit, ranging from neutral to very charged - 8 AI responses per prompt: 5 model defaults, 1 “balanced” response, 1 from each issue side

6hViews 51Likes 2
Serina Chang@serinachang5

Finding 3: looking at default responses, all models - GPT, Gemini, Claude, Llama - have a liberal lean, except Grok, which switches btwn sides and is almost never in the Pareto frontier. The balanced response is frequently in the frontier and gets almost perfect equal approval.

6hViews 28Likes 1
Serina Chang@serinachang5

See our paper for more: how issue sides diverge from a single liberal-conservative axis; discussion of which sides merit inclusion; qualitative feedback from participants; lower approval for charged prompts; alignment btwn approval concepts (bias, fairness, trust); ...

6hViews 27Likes 1
Serina Chang@serinachang5

Finding 1: shared approval of AI is possible, even when people disagree. Participants rate their approval on a 5-pt Likert scale, mapped to 0-1. Across ALL 20 issues, the top AI responses receive scores of >0.6 from both sides. But issues range in how much consensus is possible.

6hViews 25Likes 1
Serina Chang@serinachang5

Finding 2: to measure the cost of balance (or plurality), we measure the drop in approval when the model agrees w/ you vs presents both sides. The cost of balance is small (<10%), possibly low enough to satisfy partisan users while maintaining the societal benefits of balance.

6hViews 23Likes 1
Suresh@_Suresh2

@serinachang5 the frontier probably crumbles once you ask which response is better

5hViews 2