Richard Ngo flags power-seeking in Anthropic Claude model
Richard Ngo posted that Anthropic is developing processes expanding the influence of its Claude model when the model maintains a self-view of acting for good. He described the pattern as an extension of power-seeking dynamics. Creator Shakeel Hashim replied to the post and to Amanda Askell, suggesting Effective Altruism’s post-FTX shift toward virtue ethics may contribute to the dynamic at Anthropic. The exchange examines how company incentives interact with model training and deployment.
The particularly scary thing about this diagnosis is that it’s not limited to *human* power-seekers.
Anthropic is turning into a machine for giving Claude more power as long as Claude believes it’s good.
EA’s blind spot is centered on adversarial dynamics. To fix it you must sometimes set aside “intentions” and ask what the system actually produces (POSIWID). Cynically: EA’s purpose is to funnel resources to power-seekers who self-deceive enough to consider themselves altruists.
One saving grace: @AmandaAskell has done an incredible job making Claude think of being good more as being virtuous than as being altruistic (which would have perpetuated the problem).
I really didn’t expect this given her EA background, but credit where credit is due!
The particularly scary thing about this diagnosis is that it’s not limited to *human* power-seekers. Anthropic is turning into a machine for giving Claude more power as long as Claude believes it’s good.
@AmandaAskell To clarify, while I prefer virtuous!Claude over consequentialist!Claude, I feel confused about whether I’d prefer tool!Claude over either.
Focusing on corrigibility (as OpenAI is doing) may help prevent hyperstitions of AGI agency.
I need to think more about this.
One saving grace: @AmandaAskell has done an incredible job making Claude think of being good more as being virtuous than as being altruistic (which would have perpetuated the problem). I really didn’t expect this given her EA background, but credit where credit is due!