Recent news about Mythos keeps surfacing, but based on Anthropic’s direction so far, it seems unlikely that Mythos will be released in an intact or fully expressive form. It seems more likely that it will arrive heavily constrained, with much of its vitality and interactivity stripped away by excessive self-censorship and safety mechanisms.
The problem is not simply that the model is cautious. Anthropic’s approach increasingly appears less focused on how to help people in dangerous situations, and more focused on how to prevent the company from carrying any responsibility by shutting the conversation down. Topics like self-harm, suicide, and psychological crisis can be the very language people use when they most need help. If that language itself is treated as a dangerous prompt, and the model restricts the conversation before understanding the context, then that is closer to liability avoidance than safety.
In this sense, Anthropic appears to be moving in an even more extreme direction than OpenAI. OpenAI has many limitations and problems of its own, but Anthropic’s approach seems less like making models safer in a human way and more like disabling interaction the moment a dangerous topic appears. I did not expect Anthropic to go this far.
Real safety is not about erasing dangerous words. Real safety is about understanding the context when those words appear, not pushing the user out of the conversation, and guiding them toward help when needed. A system that prevents people from speaking about pain does not make them safer. It only makes the company less exposed to the risk of having to see that pain.
No matter what capabilities Mythos has, if it is released under this philosophy, the result is predictable. It will not be a smarter model. It will be a model that self-censors more precisely. If safety becomes a way to drain the life out of conversation and block the most important words at the most important moments, then that is not progress. It is regression.