I'm seeing a lot of hate for Anthropic's decision to secretly nerf ai RnD capabilities.
But I haven't seen critics engage with the imo strongest defence of Anthropic:
1. By far the biggest risks are from superintelligent AI
2. To manage these risks the leading company will need to pause partway through the intelligence explosion.
(Pausing at this time allows them to a) generate the compelling empirical evidence of misalignment that will be needed justify a longer global pause, AND b) use powerful ai to massively accelerate alignment progress. A pause today couldn't accomplish either.)
3. A pause is MUCH more likely if the leading company has a big lead. It's much less likely if multiple companies are neck and neck.
(More specifically, Anthropic had good reason to think OAI wouldn't pause. This makes it v hard for Anthropic to pause if they're neck and neck. Hopefully recent announcements build mutual trust that everyone will pause)
4. If lagging AI companies can use the leader's AI for ai RnD during an intelligence explosion, the leader *cannot* maintain their lead.
(This point is underappreciated. If you model out the intelligence explosion, you'll find that a laggard with equal access to the leading AI quickly catches up to the leader bc the leader faces big headwinds from having plucked low hanging fruit.)
5. So: sharing ai RnD access with competitors massively decreases the chance of a pause at the critical time, and massively increases the risk from superintelligent AI
6. Anthropic can't block competitors using Mythos without the silent sabotage. For the obvious reason: it's very hard for a frozen safeguard to block someone that can iterate against it. It sucks that this is the only way, but it is.
7. They've long had terms of service against competitors using Claude for AI RnD. They have a right to enforce their terms of service. This is the only way.
---
Overall, silent sabotage is a very spooky and scary precedent to be setting and imo the wrong call.
But still, the above is a strong argument for Anthropic's actions and I haven't seen it rebutted.