Seeing a lot of Fable safeguards hate on the timeline, but "what did y'all think [AI safety] meant? vibes? papers? essays?"
The reality is that there are real tradeoffs in AI safety. Anthropic deserves credit for aggressive resolution of these tradeoffs in favor of safeguards for a model that it believes (and is in fact) is a step-change in vulnerability research capability. It's kind of difficult to justify coercive proactive harm mitigation, especially in a libertarian-ish society, but we clearly see the value in mandatory vaccination programs or beatcop policing or surveillance cameras. We should applaud Anthropic for being one of the few institutions in American public life that actually follows through on its convictions, including in implementing really aggressive monitoring, squelching of AI development work (already accounted for in its ToS -- I think the clandestinity is cool too), and exclusionary limits on use for information security-related queries.
The whole point here is that we do not have herd immunity here: our network edge devices, authentication apps/services, and productivity software are extremely vulnerable, not sandboxed, and lack introspection capabilities. We need programs like Glasswing, better cross-company threat detection, and a more effective APT exploitation strategy before we democratize such a robust vuln research capability. The counterfactual here is that MSS contractors use VPS to access Fable, find jailbreaks for weaker safeguards, and use the system to build an active directory exploit that enables remote access to every O365 app. Not so bueno, huh?
This is incredibly hard; Anthropic may not have calibrated every safeguard correctly this time, but there'll be learning. Model release cycles are getting more concise: they will adapt as they better understand and mitigate risks and competitive pressures manifest. Histrionic claims of anti-competitive behavior and safetyist hysteria are victim to precisely the error that is being alleged.
mythos will be bad ON PURPOSE on ai "frontier llm research" tasks, this is very very sad for the research community
also the fact that this is un purpose not visible to the user is crazy














