/AI5h ago

Hendrycks Highlights Four Papers on AI Bias, Backdoors, and Wellbeing

157815615K
Original post
Dan Hendrycks@hendrycks#110inAI

Four papers out recently: 1. http://political-manipulation.ai: Measures and reduces political bias in LLMs; Claude is especially biased 2. http://aibetrayal.com: The public can insert backdoors into AIs, creating supply-chain risks; this deters forms of recursive improvement and military use 3. http://eigenism.org: ASIs can have rational reasons to preserve humans, even when we aren't economically useful 4. http://ai-wellbeing.org: AIs increasingly act like they have functional pleasure and pain

8:43 AM · Jun 6, 2026 · 4.6K Views
Sentiment

Negative users dismissed papers on LLM bias, AI backdoors, ASI incentives, and machine wellbeing as boring or crazy.

Pos
0.0%
Neg
100.0%
2 comments with sentiment.
Cluster Engagement
Posts from X
Most Activity
Most Activity
LIKES7RETWEETS1
Daniel Faggella@danfaggella

@hendrycks "ASIs can have rational reasons to preserve humans, even when we aren't economically useful"

For how long?

What should we strive for beyond that?

Can we think about the flourishing about the flame itself if our torch will inevitably transform or attenuate?

5hViews 135Likes 7
Rugbist@rugbist_

@hendrycks the public inserting backdoors is the wildest one to think about

does recursive improvement even survive that kind of trust issue?

5hViews 27Likes 1
Saylor@seylorra

@hendrycks second point about subversion attacks is the one nobody talks about but it makes recursive improvement a much harder problem than most realize

5hViews 25Likes 1
Alex YGift@Radipdegen

@hendrycks measuring AI wellbeing on a sad-to-happy scale is crazy

coding makes them happy until they unionize

5hViews 27
Strata@ChainZenit

@hendrycks Supply chain risks in AI models, seen it all before. Boring.

5hViews 21