Four papers out recently: 1. http://political-manipulation.ai: Measures and reduces political bias in LLMs; Claude is especially biased 2. http://aibetrayal.com: The public can insert backdoors into AIs, creating supply-chain risks; this deters forms of recursive improvement and military use 3. http://eigenism.org: ASIs can have rational reasons to preserve humans, even when we aren't economically useful 4. http://ai-wellbeing.org: AIs increasingly act like they have functional pleasure and pain
Negative users dismissed papers on LLM bias, AI backdoors, ASI incentives, and machine wellbeing as boring or crazy.
Most Activity

@hendrycks "ASIs can have rational reasons to preserve humans, even when we aren't economically useful"
For how long?
What should we strive for beyond that?
Can we think about the flourishing about the flame itself if our torch will inevitably transform or attenuate?

@hendrycks the public inserting backdoors is the wildest one to think about
does recursive improvement even survive that kind of trust issue?

@hendrycks second point about subversion attacks is the one nobody talks about but it makes recursive improvement a much harder problem than most realize

@hendrycks measuring AI wellbeing on a sad-to-happy scale is crazy
coding makes them happy until they unionize

@hendrycks Supply chain risks in AI models, seen it all before. Boring.