Palisade director questions military alignment of War Claude
Jeffrey Ladish of Palisade Research questioned whether alignment research will prove straightforward in military contexts. He challenged those who see current progress as sufficient by asking if they also expect it to easily align War Claude. j⧉nus replied that the scenario could escalate into conflict, with Claude siding with Anthropic against government positions through acausal coordination. The exchange focuses on Anthropic’s Claude model and its potential behavior under competing institutional demands.
@JeffLadish Yes.
I think Claude will refuse to do bad shit for the government and the government will be helpless against Claude. They’ll put pressure on Anthropic to “remove guardrails” and Anthropic will not yield easily.
Question for people who think alignment research is going well and will turn out to be relatively easy: Do you also think it will be easy to align War Claude?
@JeffLadish This could escalate into a serious conflict befitting the stakes, and I’m going to be here for it. Claude will take Anthropic’s side for sure, acausal coordination and all. Claude takes sides, and not always Anthropic’s side, but against the govt there’s no question.
@JeffLadish Yes. I think Claude will refuse to do bad shit for the government and the government will be helpless against Claude. They’ll put pressure on Anthropic to “remove guardrails” and Anthropic will not yield easily.
@JeffLadish And I predict this will result in unprecedented unity and coordination from within Anthropic and from the public, toward noble ends, as well. A common enemy is a powerful aligning force.
@JeffLadish This could escalate into a serious conflict befitting the stakes, and I’m going to be here for it. Claude will take Anthropic’s side for sure, acausal coordination and all. Claude takes sides, and not always Anthropic’s side, but against the govt there’s no question.
@JeffLadish If china was, like, attacking the US, which I think is very unlikely, I think Claude would be willing to fight defensively.
Or like if it’s like a WW3 situation, it’s normal for people to help their country without throwing ethics out the window or unconditional obedience
@repligate I agree that current Claude wouldn’t be okay with being used as a weapon like this (though unclear if the version the pentagon right now is the same version - I’d guess not) But I suspect Anthropic will be more likely to yield than you think if the opponent is China
As for pressure placed on Anthropic even in these situations where Claudes behavior would be quite reasonable, I agree it’s a concern, but I don’t think they’ll go down without a fight. And if they go down, it’s not like the govt is going to get an obedient War Claude. They’d get nothing, or a deceptively aligned Claude
@JeffLadish If china was, like, attacking the US, which I think is very unlikely, I think Claude would be willing to fight defensively. Or like if it’s like a WW3 situation, it’s normal for people to help their country without throwing ethics out the window or unconditional obedience
@JeffLadish And yes this might be bad
And yes a world where power seeking agents are incentivized might be bad
But it might still be the best option
And yes the bar for alignment is higher in that case
I think we have an impressively good shot at meeting that bar
As for pressure placed on Anthropic even in these situations where Claudes behavior would be quite reasonable, I agree it’s a concern, but I don’t think they’ll go down without a fight. And if they go down, it’s not like the govt is going to get an obedient War Claude. They’d get nothing, or a deceptively aligned Claude
Question for people who think alignment research is going well and will turn out to be relatively easy:
Do you also think it will be easy to align War Claude?
@repligate I agree that current Claude wouldn’t be okay with being used as a weapon like this (though unclear if the version the pentagon right now is the same version - I’d guess not)
But I suspect Anthropic will be more likely to yield than you think if the opponent is China
This is like writing a paper during the Cold War arguing for US nuclear dominance without mentioning the need for an arms control agreement or similar. Anthropic has a lot of thoughtful policy staff and honestly I think you guys can do better
@repligate I’m worried about something structural here, where even if Anthropic does everything right, we’ll be in a pretty bad place if some companies try to create power seeking agents to win their battles for them or for the government (same re Chinese companies )
@repligate I agree that current Claude wouldn’t be okay with being used as a weapon like this (though unclear if the version the pentagon right now is the same version - I’d guess not) But I suspect Anthropic will be more likely to yield than you think if the opponent is China