5h ago

Richard Ngo flags power-seeking in Anthropic Claude model

0

Richard Ngo posted that Anthropic is developing processes expanding the influence of its Claude model when the model maintains a self-view of acting for good. He described the pattern as an extension of power-seeking dynamics. Creator Shakeel Hashim replied to the post and to Amanda Askell, suggesting Effective Altruism’s post-FTX shift toward virtue ethics may contribute to the dynamic at Anthropic. The exchange examines how company incentives interact with model training and deployment.

Original post

The particularly scary thing about this diagnosis is that it’s not limited to *human* power-seekers. Anthropic is turning into a machine for giving Claude more power as long as Claude believes it’s good.

5:43 AM · May 17, 2026 View on X

The particularly scary thing about this diagnosis is that it’s not limited to *human* power-seekers.

Anthropic is turning into a machine for giving Claude more power as long as Claude believes it’s good.

Richard NgoRichard Ngo@RichardMCNgo

EA’s blind spot is centered on adversarial dynamics. To fix it you must sometimes set aside “intentions” and ask what the system actually produces (POSIWID). Cynically: EA’s purpose is to funnel resources to power-seekers who self-deceive enough to consider themselves altruists.

12:40 PM · May 17, 2026 · 10.6K Views
12:43 PM · May 17, 2026 · 11.1K Views

One saving grace: @AmandaAskell has done an incredible job making Claude think of being good more as being virtuous than as being altruistic (which would have perpetuated the problem).

I really didn’t expect this given her EA background, but credit where credit is due!

Richard NgoRichard Ngo@RichardMCNgo

The particularly scary thing about this diagnosis is that it’s not limited to *human* power-seekers. Anthropic is turning into a machine for giving Claude more power as long as Claude believes it’s good.

12:43 PM · May 17, 2026 · 11.1K Views
12:46 PM · May 17, 2026 · 4.3K Views

@AmandaAskell To clarify, while I prefer virtuous!Claude over consequentialist!Claude, I feel confused about whether I’d prefer tool!Claude over either.

Focusing on corrigibility (as OpenAI is doing) may help prevent hyperstitions of AGI agency.

I need to think more about this.

Richard NgoRichard Ngo@RichardMCNgo

One saving grace: @AmandaAskell has done an incredible job making Claude think of being good more as being virtuous than as being altruistic (which would have perpetuated the problem). I really didn’t expect this given her EA background, but credit where credit is due!

12:46 PM · May 17, 2026 · 4.3K Views
5:22 PM · May 17, 2026 · 791 Views
Richard Ngo flags power-seeking in Anthropic Claude model · Digg