5h ago

Richard Ngo flags power-seeking in Anthropic Claude model

1217912816.3K

——0——

Richard Ngo posted that Anthropic is developing processes expanding the influence of its Claude model when the model maintains a self-view of acting for good. He described the pattern as an extension of power-seeking dynamics. Creator Shakeel Hashim replied to the post and to Amanda Askell, suggesting Effective Altruism’s post-FTX shift toward virtue ethics may contribute to the dynamic at Anthropic. The exchange examines how company incentives interact with model training and deployment.

Original post

Richard Ngo#250@RICHARDMCNGO

The particularly scary thing about this diagnosis is that it’s not limited to *human* power-seekers. Anthropic is turning into a machine for giving Claude more power as long as Claude believes it’s good.

5:43 AM · May 17, 2026

Cluster engagement

8 snapshots

QUOTE POST

#250Richard Ngo@RICHARDMCNGO

The particularly scary thing about this diagnosis is that it’s not limited to *human* power-seekers.

Anthropic is turning into a machine for giving Claude more power as long as Claude believes it’s good.

Richard Ngo@RichardMCNgo

EA’s blind spot is centered on adversarial dynamics. To fix it you must sometimes set aside “intentions” and ask what the system actually produces (POSIWID). Cynically: EA’s purpose is to funnel resources to power-seekers who self-deceive enough to consider themselves altruists.

12:40 PM · May 17, 2026 · 10.6K Views

12:43 PM · May 17, 2026 · 11.1K Views

#250Richard Ngo@RICHARDMCNGO

One saving grace: @AmandaAskell has done an incredible job making Claude think of being good more as being virtuous than as being altruistic (which would have perpetuated the problem).

I really didn’t expect this given her EA background, but credit where credit is due!

Richard Ngo@RichardMCNgo

12:43 PM · May 17, 2026 · 11.1K Views

12:46 PM · May 17, 2026 · 4.3K Views

#250Richard Ngo@RICHARDMCNGO

@AmandaAskell To clarify, while I prefer virtuous!Claude over consequentialist!Claude, I feel confused about whether I’d prefer tool!Claude over either.

Focusing on corrigibility (as OpenAI is doing) may help prevent hyperstitions of AGI agency.

I need to think more about this.

Richard Ngo@RichardMCNgo

One saving grace: @AmandaAskell has done an incredible job making Claude think of being good more as being virtuous than as being altruistic (which would have perpetuated the problem). I really didn’t expect this given her EA background, but credit where credit is due!

12:46 PM · May 17, 2026 · 4.3K Views

5:22 PM · May 17, 2026 · 791 Views