Anthropic’s new Fable 5 safeguards are fascinating.
When the model is used for frontier LLM development, it apparently does not simply refuse or warn the user. Instead, it quietly limits its own effectiveness through techniques like prompt modification, steering vectors, and PEFT.
That means Claude may still answer, but become deliberately less useful for building frontier AI systems, pretraining pipelines, distributed training infrastructure, or ML accelerators.
Anthropic says this should affect only around 0.03% of traffic, but the precedent is big: They are being selectively capability-throttled in strategically sensitive domains.