/Tech2h ago

Sonnet 5 Shows No Gains in Alignment Over Opus 4.8

3300241

Original post unavailable.

Sentiment

Sentiment building, check back later.

Cluster Engagement

Digg Deeper

No Digg Deeper questions have been answered for this story yet.

Posts from X

Most Activity

VIEWS53REPLIES1

David Manheim@davidmanheim

Yes, there are strong reasons to be more optimistic than we once were about risks from AI based on progress on some aspects of AI risk, but the continued lack of progress for other aspects should worry us.

2h53

LIKES1

David Manheim@davidmanheim

We expected that as we approached human-level capabilities, progress would accelerate. That is happening.

We also expected that stronger AI systems would help us with alignment. That *isn’t* happening.

And people don’t seem to update to being more worried; I think they should.

2h411

David Manheim@davidmanheim

Sonnet is not frontier, so if alignment progress was happening, we should expect that developers could make it safer than earlier models. And it doesn’t pose more risk than previous models, but Anthropic hasn’t made it materially more safe than previous attempts either.

2h50