DeepMind scientist Alex Turner replies to Rob Wiblin
Alex Turner, a research scientist on Google DeepMind's scalable alignment team, replied to Rob Wiblin seeking clarification on which parties advocate bans on AI discussion. The exchange referenced an image from Wiblin's post and addressed policy debates. Wiblin argued that critics blaming open AI discourse for problems would logically support explicit bans once voluntary taboos proved unenforceable, exposing inconsistency in their stance. The thread linked to the Substack article AI #168 on incremental model gains and internal government arguments.
@ShakeelHashim @michaelsobolik I’m sorry but is that your view as a journalist that a company should soften its threat assessment to preserve diplomatic atmospherics on safety cooperation? Should journalists soften their facts/stories too? I find it strange.
@Turn_Trout Nobody directly but some discussion of the discourse here: https://thezvi.substack.com/p/ai-168-not-leading-the-future?open=false#%C2%A7i-learned-it-by-reading-you
@robertwiblin Who is saying we should ban discussion? (real question)
The reason I'm making fun of it is that for people who imply that this shows the people who discussed the issue are to blame, given that a voluntary taboo never would have gotten significant let alone universal compliance, a ban seems to be the actual implied proposal (and the fact that they'd oppose that speaks for itself).
@Turn_Trout Nobody directly but some discussion of the discourse here: https://thezvi.substack.com/p/ai-168-not-leading-the-future?open=false#%C2%A7i-learned-it-by-reading-you
@Turn_Trout If you filter the training dataset is the negative externality real?
@robertwiblin Yeah but it's in fact true that self-fulfilling misalignment is a negative impact of LW discourse. That doesn't mean we should ban the discourse or that it was overall bad to discuss. But the negative externality is real
@daniel_271828 @Turn_Trout @slatestarcodex Yeah I'm not following Alex - if someone's technical alignment strategy fails if someone publishes a sci-fi story about an AI doing something bad, doesn't that seems like a bad/fragile strategy?
@Turn_Trout @robertwiblin @slatestarcodex My point was not that your alignment strategy should be robust to including specific data in training (point 2 was even suggesting filtering this data out); it was that the strategy should be robust to people in the world having the conversation
@robertwiblin Who is saying we should ban discussion? (real question)
@robertwiblin Yeah but it's in fact true that self-fulfilling misalignment is a negative impact of LW discourse. That doesn't mean we should ban the discourse or that it was overall bad to discuss. But the negative externality is real
The reason I'm making fun of it is that for people who imply that this shows the people who discussed the issue are to blame, given that a voluntary taboo never would have gotten significant let alone universal compliance, a ban seems to be the actual implied proposal (and the fact that they'd oppose that speaks for itself).
@robertwiblin @slatestarcodex Or consider some responses which are like "if your alignment strategy isn't robust to this, it's dumb", which seems like an appeal to how people think the world should work (instead of how it might actually work)
This is such a mid take: 1) “rogue AI” is a common trope, not invented by LW 2) just filter out the relevant posts from training if that solves the problem then! 3) your alignment strategy should be robust to “someone, somewhere writes about the possibility of rogue AI”
(Not replying on X, engage at https://bsky.app/profile/turntrout.bsky.social)
4. It IS still true that the speculation had a negative externality (a "sociohazard", as it were). Just acknowledge the facts and move on. No need to be defensive about it. (Keeping in mind that MY early speculation is included here)
4. It IS still true that the speculation had a negative externality (a "sociohazard", as it were).
Just acknowledge the facts and move on. No need to be defensive about it. (Keeping in mind that MY early speculation is included here)
3. That doesn't mean it was wrong to speculate or that we should ban further speculation. (But be mindful with large data dumps: https://turntrout.com/dataset-protection... less responsible labs won't mitigate)
Lots of hubbub about "is LW to blame for self-fulfilling misalignment"
1. If a scientist builds a machine which does bad because people said it would, it's NOT the people's fault (morally)
2. Balance of evidence is that YES, LW & doom-speculation contributed to the problem
I don’t think the balance of evidence is in favor of your point 2! The only evidence people cite for this are some extremely vague words in a tweet thread.
In as much as anyone takes this seriously as a risk vector, things like “The Terminator movies” and associated discussion of AI in media seem more likely to contribute here, or maybe not, I don’t think anyone has actually given any concrete evidence here.
Lots of hubbub about "is LW to blame for self-fulfilling misalignment" 1. If a scientist builds a machine which does bad because people said it would, it's NOT the people's fault (morally) 2. Balance of evidence is that YES, LW & doom-speculation contributed to the problem
@Turn_Trout @robertwiblin @slatestarcodex My point was not that your alignment strategy should be robust to including specific data in training (point 2 was even suggesting filtering this data out); it was that the strategy should be robust to people in the world having the conversation
@robertwiblin @slatestarcodex Or consider some responses which are like "if your alignment strategy isn't robust to this, it's dumb", which seems like an appeal to how people think the world should work (instead of how it might actually work)