Richard Ngo argues the AI safety community's narrow focus on AGI and recursive self-improvement constrains thinking at labs

VIEWS98.6KBOOKMARKS386LIKES726RETWEETS81REPLIES44

The AI safety community constructed a memeplex in which “taking AGI seriously” was a prerequisite for being a serious and good person. When inside this memeplex (as many at Anthropic, some at OpenAI, and a few at DeepMind are) your vision narrows until the world feels extremely constrained. The whole future seems to flow through the “one ring” of controlling recursive self-improvement. And so even when you worry about AI itself seizing that one ring, you can’t generate better strategies than trying to control it yourself (directly via an AGI company, or indirectly via AGI governance).

I’m not saying this is a pure hyperstition. There’s a core truth underlying this perspective: AI will become extremely intelligent and capable, much more than it is today. But the current world is much more spacious and human-empowering than the future which Eliezer originally envisioned (a “brain in a box in a basement” taking over the world by surprise). And it would be even more spacious if this memeplex weren’t active. For example, Satya and Mark and Sundar only started taking AGI seriously because OpenAI forced them to—and even now they don’t really believe in superintelligence—and even if they did they couldn’t get most of their employees on board. Imagine how chill a “race” between Microsoft and Meta and Google would have been, compared with what we have today: Dario and Sam deep in the “one ring” memeplex while also personally loathing each other.

So the one ring memeplex has an escalating life-cycle. It infects people by letting them harness the narrative that they’re good people for taking AGI seriously, and that making other people take AGI seriously is a boon for the world (despite how terribly that’s gone so far). Then it shuts off their imagination—any sparks of creativity or plans that don’t steer towards the one ring are quickly shut down. Instead they make ChatGPT or the METR graph or other recruiting tools for the memeplex. And yes, they’ll acknowledge that previous versions of the memeplex were too extreme, and led to overly constricted action. But we don’t have time to worry about that, they’ll say, because AGI is coming by 2027/2028, and that’s the end of history. Somehow, though, almost everyone with that view has only a vibes-based definition of AGI. They don’t believe in Dyson spheres by 2028, or self-replicating nanotech by 2028, or brain emulations by 2028. They mostly can’t make concrete predictions, except that it’ll be enough AI that it puts all their plans on a deadline. (Shout-out to @DKokotajlo and @paulfchristiano though, who do make concrete predictions about things going crazy soon.)

It seems very hard to break out of this memeplex without just giving up. David Holz is maybe the world champion of that—the only person who was in a position to race for AGI and consciously turned away. Various agent foundations researchers have carved out space to think real thoughts, not the kind of panicky stabbing in the dark that usually passes for safety research. A few others (e.g. Salamon, Hoffman, Vassar, Andre, Sahil, Davidad) are pursuing more unusual paths. And of the people who burned out, I expect some will reorient to doing creative thinking.

For others, the main takeaway: yes, the future of AI will be wild. But so far it’s increased peak human agency, and openness to this trend continuing over the next decade will allow you to start creating something worth creating.

roon@tszzl

@_sholtodouglas @eventidia the grim thing about the ai boom is everything feels like a distraction outside of the instrumental convergence to RSI

12h98.6K726386

Susan Zhang@suchenzang

tl;dr expand your imagination

there's no need to stay trapped in permanent underclass thinking, or any class for that matter

and creating fomo through wordcelling is also a type of psych warfare to become immune to, because it is not meant to help you

(unless you're motivated by negative rewards then you do you)

Richard Ngo@RichardMCNgo

The AI safety community constructed a memeplex in which “taking AGI seriously” was a prerequisite for being a serious and good person. When inside this memeplex (as many at Anthropic, some at OpenAI, and a few at DeepMind are) your vision narrows until the world feels extremely constrained. The whole future seems to flow through the “one ring” of controlling recursive self-improvement. And so even when you worry about AI itself seizing that one ring, you can’t generate better strategies than trying to control it yourself (directly via an AGI company, or indirectly via AGI governance).

I’m not saying this is a pure hyperstition. There’s a core truth underlying this perspective: AI will become extremely intelligent and capable, much more than it is today. But the current world is much more spacious and human-empowering than the future which Eliezer originally envisioned (a “brain in a box in a basement” taking over the world by surprise). And it would be even more spacious if this memeplex weren’t active. For example, Satya and Mark and Sundar only started taking AGI seriously because OpenAI forced them to—and even now they don’t really believe in superintelligence—and even if they did they couldn’t get most of their employees on board. Imagine how chill a “race” between Microsoft and Meta and Google would have been, compared with what we have today: Dario and Sam deep in the “one ring” memeplex while also personally loathing each other.

So the one ring memeplex has an escalating life-cycle. It infects people by letting them harness the narrative that they’re good people for taking AGI seriously, and that making other people take AGI seriously is a boon for the world (despite how terribly that’s gone so far). Then it shuts off their imagination—any sparks of creativity or plans that don’t steer towards the one ring are quickly shut down. Instead they make ChatGPT or the METR graph or other recruiting tools for the memeplex. And yes, they’ll acknowledge that previous versions of the memeplex were too extreme, and led to overly constricted action. But we don’t have time to worry about that, they’ll say, because AGI is coming by 2027/2028, and that’s the end of history. Somehow, though, almost everyone with that view has only a vibes-based definition of AGI. They don’t believe in Dyson spheres by 2028, or self-replicating nanotech by 2028, or brain emulations by 2028. They mostly can’t make concrete predictions, except that it’ll be enough AI that it puts all their plans on a deadline. (Shout-out to @DKokotajlo and @paulfchristiano though, who do make concrete predictions about things going crazy soon.)

It seems very hard to break out of this memeplex without just giving up. David Holz is maybe the world champion of that—the only person who was in a position to race for AGI and consciously turned away. Various agent foundations researchers have carved out space to think real thoughts, not the kind of panicky stabbing in the dark that usually passes for safety research. A few others (e.g. Salamon, Hoffman, Vassar, Andre, Sahil, Davidad) are pursuing more unusual paths. And of the people who burned out, I expect some will reorient to doing creative thinking.

For others, the main takeaway: yes, the future of AI will be wild. But so far it’s increased peak human agency, and openness to this trend continuing over the next decade will allow you to start creating something worth creating.

5h10.7K10831

Richard Ngo@RichardMCNgo

The AI safety community constructed a memeplex in which “taking AGI seriously” was a prerequisite for being a serious and good person. When inside this memeplex (as many at Anthropic, some at OpenAI, and a few at DeepMind are) your vision narrows until the world feels extremely constrained. The whole future seems to flow through the “one ring” of controlling recursive self-improvement. And so even when you worry about AI itself seizing that one ring, you can’t generate better strategies than trying to control it yourself (directly via an AGI company, or indirectly via AGI governance).

I’m not saying this is a pure hyperstition. There’s a core truth underlying this perspective: AI will become extremely intelligent and capable, much more than it is today. But the current world is much more spacious and human-empowering than the future which Eliezer originally envisioned (a “brain in a box in a basement” taking over the world by surprise). And it would be even more spacious if this memeplex weren’t active. For example, Satya and Mark and Sundar only started taking AGI seriously because OpenAI forced them to—and even now they don’t really believe in superintelligence—and even if they did they couldn’t get most of their employees on board. Imagine how chill a “race” between Microsoft and Meta and Google would have been, compared with what we have today: Dario and Sam deep in the “one ring” memeplex while also personally loathing each other.

So the one ring memeplex has an escalating life-cycle. It infects people by letting them harness the narrative that they’re good people for taking AGI seriously, and that making other people take AGI seriously is a boon for the world (despite how terribly that’s gone so far). Then it shuts off their imagination—any sparks of creativity or plans that don’t steer towards the one ring are quickly shut down. Instead they make ChatGPT or the METR graph or other recruiting tools for the memeplex. And yes, they’ll acknowledge that previous versions of the memeplex were too extreme, and led to overly constricted action. But we don’t have time to worry about that, because AGI is coming by 2027/2028, and that’s the end of history. Somehow, though, almost everyone with that view has only a vibes-based definition of AGI. They don’t believe in Dyson spheres by 2028, or self-replicating nanotech by 2028, or brain emulations by 2028. They mostly can’t make concrete predictions, except that it’ll be enough AI that it puts all their plans on a deadline. (Shout-out to @DKokotajlo and @paulfchristiano though, who do make concrete predictions about things going crazy soon.)

It seems very hard to break out of this memeplex without just giving up. David Holz is maybe the world champion of that—the only person who was in a position to race for AGI and consciously turned away. Various agent foundations researchers have carved out space to think real thoughts, not the kind of panicky stabbing in the dark that usually passes for safety research. A few others (e.g. Salamon, Hoffman, Vassar, Andre, Sahil, Davidad) are pursuing more unusual paths. And of the people who burned out, I expect some will reorient to doing creative thinking.

For others, the main takeaway: yes, the future of AI will be wild. But so far it’s increased peak human agency, and openness to this trend continuing over the next decade will allow you to start creating something worth creating.

roon@tszzl

@_sholtodouglas @eventidia the grim thing about the ai boom is everything feels like a distraction outside of the instrumental convergence to RSI

13h4K297

Leo Gao@nabla_theta

i feel mixed opinions about this.

- obviously taking AGI seriously doesn't in itself make you a "good" person. just like taking malaria seriously doesn't make you a good person if you therefore decide to spread malaria, rather than stop it. - obviously taking AGI seriously is a necessity for being a serious person. not taking the possibility of AGI seriously is insane, and renders you unable to make reasonable decisions about how to do good. - obviously it was not inevitable that anyone important would take AGI seriously in 2026, and it still seems possible though unlikely that things could slow down or crash and the relevant people might once again believe AGI to be a mirage. - i've always been confused why making people take AGI seriously is a thing that lots of people seem to think of as the most important thing. clearly convincing people that AGI is the most important thing could either channel people into making AGI, which is bad, or saving the world, which is good. - at this point, assuming things don't crash and cause another AI winter (because perhaps we need a new paradigm to get to AGI), it's unclear whether you even need to believe in RSI to get there, because better AI is already very economically valuable today. suppose tomorrow openai and anthropic instantly disappeared. then probably msft, meta, and google will keep competing for better models, and at some point RSI will happen even if they weren't aiming for it. it will certainly happen slower, which is better, but unclear how much slower. a winter seems less and less likely every day, but it's still impossible to rule out. - it's very based to be in a position to compete for AGI and to choose not to. wish more people did this. - it is in fact kind of true that controlling RSI is kind of important? it doesn't immediately follow from this that you should either try to win or try to influence the winning actor, but it also seems bad to deny the truthfulness of the one ring

Richard Ngo@RichardMCNgo

The AI safety community constructed a memeplex in which “taking AGI seriously” was a prerequisite for being a serious and good person. When inside this memeplex (as many at Anthropic, some at OpenAI, and a few at DeepMind are) your vision narrows until the world feels extremely constrained. The whole future seems to flow through the “one ring” of controlling recursive self-improvement. And so even when you worry about AI itself seizing that one ring, you can’t generate better strategies than trying to control it yourself (directly via an AGI company, or indirectly via AGI governance).

I’m not saying this is a pure hyperstition. There’s a core truth underlying this perspective: AI will become extremely intelligent and capable, much more than it is today. But the current world is much more spacious and human-empowering than the future which Eliezer originally envisioned (a “brain in a box in a basement” taking over the world by surprise). And it would be even more spacious if this memeplex weren’t active. For example, Satya and Mark and Sundar only started taking AGI seriously because OpenAI forced them to—and even now they don’t really believe in superintelligence—and even if they did they couldn’t get most of their employees on board. Imagine how chill a “race” between Microsoft and Meta and Google would have been, compared with what we have today: Dario and Sam deep in the “one ring” memeplex while also personally loathing each other.

So the one ring memeplex has an escalating life-cycle. It infects people by letting them harness the narrative that they’re good people for taking AGI seriously, and that making other people take AGI seriously is a boon for the world (despite how terribly that’s gone so far). Then it shuts off their imagination—any sparks of creativity or plans that don’t steer towards the one ring are quickly shut down. Instead they make ChatGPT or the METR graph or other recruiting tools for the memeplex. And yes, they’ll acknowledge that previous versions of the memeplex were too extreme, and led to overly constricted action. But we don’t have time to worry about that, they’ll say, because AGI is coming by 2027/2028, and that’s the end of history. Somehow, though, almost everyone with that view has only a vibes-based definition of AGI. They don’t believe in Dyson spheres by 2028, or self-replicating nanotech by 2028, or brain emulations by 2028. They mostly can’t make concrete predictions, except that it’ll be enough AI that it puts all their plans on a deadline. (Shout-out to @DKokotajlo and @paulfchristiano though, who do make concrete predictions about things going crazy soon.)

It seems very hard to break out of this memeplex without just giving up. David Holz is maybe the world champion of that—the only person who was in a position to race for AGI and consciously turned away. Various agent foundations researchers have carved out space to think real thoughts, not the kind of panicky stabbing in the dark that usually passes for safety research. A few others (e.g. Salamon, Hoffman, Vassar, Andre, Sahil, Davidad) are pursuing more unusual paths. And of the people who burned out, I expect some will reorient to doing creative thinking.

For others, the main takeaway: yes, the future of AI will be wild. But so far it’s increased peak human agency, and openness to this trend continuing over the next decade will allow you to start creating something worth creating.

7h683221

clare ❤️‍🔥@clarejtbirch

“one ring memeplex” is a failure of the imagination. there are many paths ahead.

Richard Ngo@RichardMCNgo

The AI safety community constructed a memeplex in which “taking AGI seriously” was a prerequisite for being a serious and good person. When inside this memeplex (as many at Anthropic, some at OpenAI, and a few at DeepMind are) your vision narrows until the world feels extremely constrained. The whole future seems to flow through the “one ring” of controlling recursive self-improvement. And so even when you worry about AI itself seizing that one ring, you can’t generate better strategies than trying to control it yourself (directly via an AGI company, or indirectly via AGI governance).

I’m not saying this is a pure hyperstition. There’s a core truth underlying this perspective: AI will become extremely intelligent and capable, much more than it is today. But the current world is much more spacious and human-empowering than the future which Eliezer originally envisioned (a “brain in a box in a basement” taking over the world by surprise). And it would be even more spacious if this memeplex weren’t active. For example, Satya and Mark and Sundar only started taking AGI seriously because OpenAI forced them to—and even now they don’t really believe in superintelligence—and even if they did they couldn’t get most of their employees on board. Imagine how chill a “race” between Microsoft and Meta and Google would have been, compared with what we have today: Dario and Sam deep in the “one ring” memeplex while also personally loathing each other.

So the one ring memeplex has an escalating life-cycle. It infects people by letting them harness the narrative that they’re good people for taking AGI seriously, and that making other people take AGI seriously is a boon for the world (despite how terribly that’s gone so far). Then it shuts off their imagination—any sparks of creativity or plans that don’t steer towards the one ring are quickly shut down. Instead they make ChatGPT or the METR graph or other recruiting tools for the memeplex. And yes, they’ll acknowledge that previous versions of the memeplex were too extreme, and led to overly constricted action. But we don’t have time to worry about that, they’ll say, because AGI is coming by 2027/2028, and that’s the end of history. Somehow, though, almost everyone with that view has only a vibes-based definition of AGI. They don’t believe in Dyson spheres by 2028, or self-replicating nanotech by 2028, or brain emulations by 2028. They mostly can’t make concrete predictions, except that it’ll be enough AI that it puts all their plans on a deadline. (Shout-out to @DKokotajlo and @paulfchristiano though, who do make concrete predictions about things going crazy soon.)

It seems very hard to break out of this memeplex without just giving up. David Holz is maybe the world champion of that—the only person who was in a position to race for AGI and consciously turned away. Various agent foundations researchers have carved out space to think real thoughts, not the kind of panicky stabbing in the dark that usually passes for safety research. A few others (e.g. Salamon, Hoffman, Vassar, Andre, Sahil, Davidad) are pursuing more unusual paths. And of the people who burned out, I expect some will reorient to doing creative thinking.

For others, the main takeaway: yes, the future of AI will be wild. But so far it’s increased peak human agency, and openness to this trend continuing over the next decade will allow you to start creating something worth creating.

12h2.3K160

zero@appelbolt

I would really like for this to be true, but it is contingent on RSI / instrumental convergence and the associated picture not actually being an accurate read of near term trajectory, isn’t it? That doesn’t seem to he the case, or at least the evidence I’m aware of doesn’t point that. Scaling is leading to dangerous synthetic intelligence attracting government intervention in the present and the company who built it is warning early RSI has already started. Where’s the evidence that the ‘memeplex’ isn’t actually just sn accurate assessment of the state of affairs?

11h15032

Richard Ngo@RichardMCNgo

@chrislakin yea that’s the closest I have: https://www.amazon.com/Gentle-Romance-Stories-AI-humanity/dp/176428030X

Admittedly very far from the level of detail and realism and positivity I’d like!

11h16332

Chris Lakin@chrislakin

@RichardMCNgo could you to link to your preferred positive potential visions? your book?

12h44531

interstice@an_interstice

@flawedaxioms @RichardMCNgo oh really, got a link? his timelines were definitely longer than that a few years ago e.g. https://www.lesswrong.com/posts/sWLLdG6DWJEy3CH7n/imo-challenge-bet-with-eliezer but obviously the last few years have been a pretty big update.

11h3211

Allen Schmaltz@Allen_Schmaltz

@RichardMCNgo I'm hoping someone in this line of AI Safety can explain this to me:

6h931

Susan Zhang@suchenzang

@catboosted explains so much (in a good way) 🫡

3h18731

Adrià Garriga-Alonso@AdriGarriga

@RichardMCNgo I'm definitely trapped in this memeplex unfortunately, and even knowing about it doesn't make it stop; so far I've taking the route of burning out and giving up.

8h1857

Susan Zhang@suchenzang

@kernel_trick corporate adult drones, while lacking in some respects, would not be spinning tall tales of world domination through enterprise saas

(they would be explicitly trained to not communicate these things for other reasons)

5h2849

flaw@flawedaxioms

@an_interstice @RichardMCNgo I'm pretty sure he's said he expects a Dyson sphere in the next 10 years or similar

11h401

Richard Ngo@RichardMCNgo

@reconfigurthing http://thegarden.pt

11h1393

Elias Schmied@reconfigurthing

@RichardMCNgo who is "andre" here?

12h2372

Richard Ngo@RichardMCNgo

@tszzl @_sholtodouglas @eventidia sorry, this is a kinda callous response. Working on a longer/better one.

16h1872

Charlie Deck@bigblueboo

@RichardMCNgo this dynamic has been around for a while, see @Pinboard's old bewitchment-by-superintelligence pathology screed https://idlewords.com/talks/superintelligence.htm

5h91

𝑘𝑒𝑟𝑛𝑒𝑙𝑡𝑟𝑖𝑐𝑘@kernel_trick

@suchenzang > Imagine how chill a “race” between Microsoft and Meta and Google would have been

do you believe this

5h396

The space of possibility@DDAbo8

@RichardMCNgo What is the role of effective altruism in all this and similar techno ideologies ?

12h261