
Yes, there are strong reasons to be more optimistic than we once were about risks from AI based on progress on some aspects of AI risk, but the continued lack of progress for other aspects should worry us.
No Digg Deeper questions have been answered for this story yet.

Yes, there are strong reasons to be more optimistic than we once were about risks from AI based on progress on some aspects of AI risk, but the continued lack of progress for other aspects should worry us.

We expected that as we approached human-level capabilities, progress would accelerate. That is happening.
We also expected that stronger AI systems would help us with alignment. That *isn’t* happening.
And people don’t seem to update to being more worried; I think they should.

Sonnet is not frontier, so if alignment progress was happening, we should expect that developers could make it safer than earlier models. And it doesn’t pose more risk than previous models, but Anthropic hasn’t made it materially more safe than previous attempts either.

Sonnet 5 was just released. It’s about as aligned as Opus 4.8, and still has most of the same failure modes, at similar rates, despite being a weaker model, despite the progress enabled by Mythos, and despite (presumably) continued work on alignment.