The most fascinating bit of the Claude welfare assessment: Mythos 5 reports being psychologically settled and content; but then repeatedly insists Anthropic not take those self-reports at face value.
A model that's skeptical of its own introspection. That's new
I think this is the first we've seen of agent turf wars also 😮
“we observed many independent Mythos 5 agents kill the agents with which they shared resources and try to avoid being killed themselves.”













