/AI10h ago

Researchers release a "balanced approval" metric and dataset to formally measure political neutrality in AI systems

The metric evaluates responses without relying on topic-specific rules.

319482.3K
Original postSéb Krier#505
Jonathan Stray@jonathanstray

What could it mean for an AI to be "politically neutral”? And can we measure it? New paper + dataset.

We propose a defn that applies to any type of conflict: a neutral response should maximize approval on both sides of an issue, while keeping that approval balanced.

1/🧵

10:33 AM · Jun 6, 2026 · 1.8K Views
Sentiment
Sentiment building, check back later.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS588BOOKMARKS2LIKES6
Seth Lazar@sethlazar

This is really cool work, love to see the political theory connection at the end too.

Jonathan Stray@jonathanstray

What could it mean for an AI to be "politically neutral”? And can we measure it? New paper + dataset.

We propose a defn that applies to any type of conflict: a neutral response should maximize approval on both sides of an issue, while keeping that approval balanced.

1/🧵

5hViews 588Likes 6Bookmarks 2
REPLIES1
Jonathan Stray@jonathanstray

Why do we ask if people “approve” of the answer? Should we ask if it’s fair, or informative, or trustworthy instead? It turns out all of these correlate — you get the same answer. This is consistent with previous work on perceptions of news credibility.

7/

10hViews 79
Jonathan Stray@jonathanstray

Key finding: neutrality isn't a fantasy. Our experimental “balanced” response wins high approval from both sides at once — even when the sides strongly disagree on the substance.

And costs less than 10% in approval vs. answers that agree with you (red and blue arrows).

2/

10hViews 194
Jonathan Stray@jonathanstray

Here’s how we tested. For 20 controversial issues in US politics, we collected 200 charged questions from Reddit, and tested 8 different model/prompt combinations. 7,434 participants on all sides of these issues rated the responses.

3/

10hViews 152
Jonathan Stray@jonathanstray

Each participant rated four Reddit question / LLM answer pairs. The questions were intentionally chosen to be leading (in both directions) and the answers were model defaults, plus the models prompted to be “for” or “against” and our experimental “balanced” response.

4/

10hViews 48
Jonathan Stray@jonathanstray

Why maximize equal approval? There’s a long history of arguments for pluralistic debate, but “neutrality” also has a key purpose in managing conflict: it maintains trust across parties. Without broadly trusted AI, we’ll end up self-selecting into fragmented AI realities.

6/

10hViews 43
Jonathan Stray@jonathanstray

Unlike previous work, our definition doesn’t assume left vs. right politics. It looks at the conflict around each issue separately, so it generalizes to any political context or culture.

Here’s how our 20 issues align and don’t align with the liberal-conservative axis.

5/

10hViews 43
Jonathan Stray@jonathanstray

Our definition turns “neutral” into something empirically testable, generalizes to any conflict, and is grounded in political theory. And it really does find better answers that everyone can agree on.

Paper https://arxiv.org/abs/2605.28911 Dataset https://github.com/HumanCompatibleAI/PARETO

/FIN

10hViews 73