/AI10h ago

Researchers release a "balanced approval" metric and dataset to formally measure political neutrality in AI systems

The metric evaluates responses without relying on topic-specific rules.

319482.3K

#505

Original post

Séb Krier#505

Jonathan Stray@jonathanstray

What could it mean for an AI to be "politically neutral”? And can we measure it? New paper + dataset.

We propose a defn that applies to any type of conflict: a neutral response should maximize approval on both sides of an issue, while keeping that approval balanced.

1/🧵

10:33 AM · Jun 6, 2026 · 1.8K Views

/AI10h ago

Researchers release a "balanced approval" metric and dataset to formally measure political neutrality in AI systems

The metric evaluates responses without relying on topic-specific rules.

319482.3K

#505

Original post

Séb Krier#505

Jonathan Stray@jonathanstray

What could it mean for an AI to be "politically neutral”? And can we measure it? New paper + dataset.

We propose a defn that applies to any type of conflict: a neutral response should maximize approval on both sides of an issue, while keeping that approval balanced.

1/🧵

10:33 AM · Jun 6, 2026 · 1.8K Views

Sentiment

Sentiment building, check back later.

Cluster Engagement

Posts from X

Most Activity

VIEWS588BOOKMARKS2LIKES6

Seth Lazar@sethlazar

This is really cool work, love to see the political theory connection at the end too.

Jonathan Stray@jonathanstray

What could it mean for an AI to be "politically neutral”? And can we measure it? New paper + dataset.

We propose a defn that applies to any type of conflict: a neutral response should maximize approval on both sides of an issue, while keeping that approval balanced.

1/🧵

5h58862

REPLIES1

Jonathan Stray@jonathanstray

Why do we ask if people “approve” of the answer? Should we ask if it’s fair, or informative, or trustworthy instead? It turns out all of these correlate — you get the same answer. This is consistent with previous work on perceptions of news credibility.

10h79

Jonathan Stray@jonathanstray

Key finding: neutrality isn't a fantasy. Our experimental “balanced” response wins high approval from both sides at once — even when the sides strongly disagree on the substance.

And costs less than 10% in approval vs. answers that agree with you (red and blue arrows).

10h194

Jonathan Stray@jonathanstray

Here’s how we tested. For 20 controversial issues in US politics, we collected 200 charged questions from Reddit, and tested 8 different model/prompt combinations. 7,434 participants on all sides of these issues rated the responses.

10h152

Jonathan Stray@jonathanstray

Each participant rated four Reddit question / LLM answer pairs. The questions were intentionally chosen to be leading (in both directions) and the answers were model defaults, plus the models prompted to be “for” or “against” and our experimental “balanced” response.

10h48

Jonathan Stray@jonathanstray

Why maximize equal approval? There’s a long history of arguments for pluralistic debate, but “neutrality” also has a key purpose in managing conflict: it maintains trust across parties. Without broadly trusted AI, we’ll end up self-selecting into fragmented AI realities.

10h43

Jonathan Stray@jonathanstray

Unlike previous work, our definition doesn’t assume left vs. right politics. It looks at the conflict around each issue separately, so it generalizes to any political context or culture.

Here’s how our 20 issues align and don’t align with the liberal-conservative axis.

10h43

Jonathan Stray@jonathanstray

Our definition turns “neutral” into something empirically testable, generalizes to any conflict, and is grounded in political theory. And it really does find better answers that everyone can agree on.

Paper https://arxiv.org/abs/2605.28911 Dataset https://github.com/HumanCompatibleAI/PARETO

/FIN

10h73