You can't distill a model if you don't know when it's lying to you.
Anthropic's Fable has new restrictions on frontier AI research requests.
Unlike bio safeguards that visibly kick you to a different model, these are silent. The model just quietly sandbags.
We asked @fleetingbits why.
"It's interesting to ask why they did that... my best guess is this is aimed against Chinese labs."
"It's an anti-distillation measure because it's not telling you when it's sandbagging, you don't know when you're getting bad data."

