Effective Altruists Dilute AI Safety Terminology Through Censorship Push

Original post

A lot of the social media censorship happened under the banner of “preventing hate speech”. That still applies in the AI context, but my sense is that a lot of the AI censorship infrastructure was set up also under the banner of AI safety, via a combination of two mechanisms:

1. Effective altruists inside OpenAI and Anthropic (including my past self) wanted to mitigate existential risk, and more specifically to have tangible impact quickly, so pushed for more monitoring and control of AI outputs.

2. ML researchers inside the companies noticed that they could get more buy-in from leadership by branding things like censoring pornographic outputs as “safety”. (It helped that “trust and safety” was an established term.)

In hindsight the “AI safety” terminology was a predictable mistake, because it couldn’t resist dilution into meaninglessness. Why did it catch on with the effective altruist influx into working on AI risk in the last decade (while rationalists resisted it)? Most directly I think because it sounded more accessible and easier to communicate. That’s part of a broader pattern of mistakes though.

Effective Altruists were originally the people sincere enough to take the tenets of globalism to their logical conclusions (we’re all equal? So my life should be spent raising money for Africans) but not sincere enough to attend to the resulting contradictions (why is Africa so poor in the first place?)

Then as the trajectory of AI became more obvious EA pivoted to being the group sincere enough to take AI risk seriously, but not principled enough to aim at robust strategies for reducing AI risk (e.g. in the face of adversarial political dynamics). Thus there are many examples of tradeoffs made to be more legible/prestigious/short-term impactful which turned out to be counterproductive or otherwise harmful. This seems like one of them.

Richard Ngo@RichardMCNgo

For more on the censorship stuff, see @arctotherium42’s threads linked below.

I expect that similar dynamics apply in AI companies. E.g. the manual tweaking of Reddit algorithms to suppress r/The_Donald is likely paralleled by manual tweaking to suppress specific AI opinions.

8:49 AM · Jun 25, 2026 · 922 Views