/AI5h ago

Hendrycks Highlights Four Papers on AI Bias, Backdoors, and Wellbeing

157815615K

Original post

Four papers out recently: 1. http://political-manipulation.ai: Measures and reduces political bias in LLMs; Claude is especially biased 2. http://aibetrayal.com: The public can insert backdoors into AIs, creating supply-chain risks; this deters forms of recursive improvement and military use 3. http://eigenism.org: ASIs can have rational reasons to preserve humans, even when we aren't economically useful 4. http://ai-wellbeing.org: AIs increasingly act like they have functional pleasure and pain

8:43 AM · Jun 6, 2026 · 4.6K Views

/AI5h ago

Hendrycks Highlights Four Papers on AI Bias, Backdoors, and Wellbeing

157815615K

#110

Original post

Dan Hendrycks@hendrycks#110inAI

8:43 AM · Jun 6, 2026 · 4.6K Views

Sentiment

Negative users dismissed papers on LLM bias, AI backdoors, ASI incentives, and machine wellbeing as boring or crazy.

Pos

0.0%

Neg

100.0%

2 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

LIKES7RETWEETS1

Daniel Faggella@danfaggella

@hendrycks "ASIs can have rational reasons to preserve humans, even when we aren't economically useful"

For how long?

What should we strive for beyond that?

Can we think about the flourishing about the flame itself if our torch will inevitably transform or attenuate?

5h1357

Rugbist@rugbist_

@hendrycks the public inserting backdoors is the wildest one to think about

does recursive improvement even survive that kind of trust issue?

5h271

Saylor@seylorra

@hendrycks second point about subversion attacks is the one nobody talks about but it makes recursive improvement a much harder problem than most realize

5h251

Alex YGift@Radipdegen

@hendrycks measuring AI wellbeing on a sad-to-happy scale is crazy

coding makes them happy until they unionize

5h27

Strata@ChainZenit

@hendrycks Supply chain risks in AI models, seen it all before. Boring.

5h21