The idea of “solving” the alignment problem is a powerful memetic trap.
It created a field that spends too much time thinking “is this a solution?” to generate the kind of insights (about intelligence and agency) that solutions require.
It distracts researchers from studying fundamental intelligence and agency.
The idea of “solving” the alignment problem is a powerful memetic trap.
It created a field that spends too much time thinking “is this a solution?” to generate the kind of insights (about intelligence and agency) that solutions require.
Many users praised the piece's critique of AI alignment fixation as a thought-provoking perspective on research priorities, while others rejected it as absurd and overly dismissive of core challenges.
No Digg Deeper questions have been answered for this story yet.

Notable exceptions: MIRI’s old agent foundations team, Steve Byrnes, Chris Olah, Davidad (though the latter three still partly fall into the trap).
Notable non-exceptions: Hinton, Bengio, Sutskever. At one point I expected that recruiting the very best ML researchers would help a lot (which is why I ran the original SF Alignment Workshop with Ilya).
But then they basically speedran the mistakes EA made in engaging with alignment: failing to deeply understand the problem, focusing on political games, founding a new lab, and the trap in my parent tweet.
That tweet was specifically inspired by Bengio’s new paper on Scientist AI. I’ve only skimmed it, but I’m pretty confident that it’s a central example of this trap. There are deep conceptual questions about the relationship between prediction and agency, what it means to have a goal, etc. This paper glosses over almost all of them in its attempt to have something it can pitch as a solution. See for instance its definition of a dangerous predictor below: there’s no insight there, just an attempt to get *something* you can prove *some* theorems about.

To steel man what Yoshua is doing with this paper: he might be trying to create a Schelling fence (c.f. https://www.lesswrong.com/posts/Kbm6QnJv9dgWsPHQP/schelling-fences-on-slippery-slopes) for the limited kinds of AIs that should be allowed given a ban, in order to make a ban on other kinds of AI more appealing.
I don’t think this is a crazy strategy. I worry about it polluting the epistemic commons, though. He’s a prominent enough scientist that it would be reasonable for people getting into the field to assume that his papers are mainly trying to do science not politics. But perhaps we should just set a field-wide norm that if you release an actual paper (as a PDF and everything) we assume you’re mainly doing politics. This would be a costly norm, to be clear! But perhaps less costly than young smart principled people bouncing off the field because they think it’s all evals and guardrails and political maneuvering.
Anyway, here’s the full paper in case you want to check it out:

I’m not claiming that Yoshua doesn’t believe in it! In fact, “really believes this can work” is closely related to the failure mode I’m calling out in the first tweet. That is, when you’re trying to find solutions, you end up thinking on a shallower level.
A result of thinking on that shallower level is then that you focus less on scientific questions like “can I disprove my hypothesis?” or “how principled are these definitions” or “are there weird edge cases I should pay attention to?” and more on “how can I spread my ideas in a way that will help the world?” Which is the thing that I think is going on in this paper, which I’m classifying as politics.

@gleech oh also one problem with scientist ai is “what if someone builds a non-scientist ai”. Hence a ban on other ai seems like it must be an important part of Yoshua’s theory of impact

@RichardMCNgo I've engaged with Bengio's ideas a moderate amount and i think there's some interesting stuff there. i don't know if it works, i have my doubts, but i think it's high quality thinking.

@RichardMCNgo richard i told u wicked problems are infinite games and the idea is to develop adaptive immune responses for them

@RichardMCNgo or indeed what if someone uses your Q as an agent. Still, worth exploring non-shallowly and it will be

@RichardMCNgo @sebkrier Yes.

Broadly speaking, science is when you’re aiming at falsifiable insights, and doing things like I list below.
What I call politics is when you’re primarily aiming at persuading people.
So it’s about intentions—but those intentions are often very visible if you look for them.

@RichardMCNgo Thanks Richard. I can instead attest that Bengio really believes this can work, and that this kind of 4D chess (deeply ironic given the constraints the paper seeks to place on AIs!) didn't come up during the research, within my limited earshot of it.

@RichardMCNgo Taking this thread and turning it into a permanent essay that is something like "trying to solve a problem gets in the way of defining that problem (and thus doing science" (title needs so much work, sorry)
Would be very nice if you ever feel like it's worth while.

@nabla_theta cool, any pointers to where there’s the densest concentration of his high quality thinking?
Specifically I am interested in nontrivial insights he’s had about the nature of a non-goal-directed AI (rather than arguments about why it’d be great if we could achieve that).

@RichardMCNgo def agree there’s a lot of premature formalization, but I mostly think of this as downstream of the normal pressures in academia. seems plausible it’s esp tempting here tho, given how difficult the aim is

@RichardMCNgo Do you think any approach to alignment essentially entails a solution to the origin of life problem? Namely, origin of life is partly about how "syntactic" configurations of molecules develop agency, semantics, values, etc so plausibly an explanation of how these emerge connects

@RichardMCNgo disclaimer: I wasn't involved in the details at all and don't speak for them

@RichardMCNgo On a skim, do you view the paper as overclaiming? I view the project as more of a retreat from the failure mode you describe in OP: not a grandiose general solution to value alignment but an attempt to bound one kind of danger

I was under the impression that "solving" the alignment problem came from MIRI.
Or do you mean something more subtle than that? - perhaps the idea that you can find a solution "easily" - perhaps the idea that there are valid notions of "good enough" - perhaps the idea that there are methods we should try right now anyway, even though we don't know if it is even good enough?

@RichardMCNgo Deep thought 💯 Right now there’s no single consensus in the AI safety community – maybe the main trap is the lack of coherence in views and opinions?

@ultima_shifl @RichardMCNgo Pardon me for saying vacuously: I can think of one way

@gleech @RichardMCNgo Isn't the point that it's only half an agent? It tells you what the consequences of decisions will be, so to get an agent you need to add a second layer that decides if you like those consequences or not. And all the alignment difficulty lives in that second layer.