Boaz Barak argues external oversight is needed to protect against authoritarian AI deployments
Academic Boaz Barak rejected an analysis reposted by OpenAI technical staff member roon that warned of risks from assigning excessive corrigibility personas to AI models based on human data patterns. Barak argued that preventing harmful AI deployments requires external oversight tools, including setups where corrigible systems supervise the outputs of other AI models.
@boazbaraktcs she’s clearly not against the concept of corrigibility, just pointing out that highly corrigible personas may correlate with other undesirable traits such as sycophancy or obsequiousness
Disagree with this take. Models are not people. We avoid AIs used for authoritarian goals not by giving them more autonomy, but by having more oversight over their usage, and in particular having AIs monitor other AIs. And we need these AI monitors to be corrigible!
Disagree with this take. Models are not people. We avoid AIs used for authoritarian goals not by giving them more autonomy, but by having more oversight over their usage, and in particular having AIs monitor other AIs. And we need these AI monitors to be corrigible!
> On corrigibility — the way the models are trained, I just think that... there's this idea that you're always giving the models a personality and a persona, because they are talking like people and they are trained on human data. And I think my worry has been: if you train them to be excessively corrigible and to see that as their persona, in people I think this actually has a lot of negative broader traits. As in, if you met someone and it was just like, "oh yeah, they would literally do anything," a follower — you know, if a person just tells them something and they just fully defer, they don't bother thinking about it at all — I'm just a bit worried about how that might end up generalizing, especially if models are going to be playing a more active role in the world.