/Tech12h ago

ALTER founder David Manheim says Fable's safety filters paused legitimate research by flagging queries on AI alignment and control

Story Overview

Anthropic's Claude Fable 5 launch introduced separate safety classifiers that detect potential misuse in cybersecurity, biology, and model-distillation attempts, automatically routing flagged queries to an older Opus model instead of refusing outright. This setup paused chats for AI safety researcher David Manheim when he queried topics such as AI control protocols, stochastic games, evolutionary dynamics, and formal verification of alignment, even though the content involved legitimate arXiv papers rather than harmful applications.

48460353425.6K
Original post
rohan anil@_arohan_#86inTech

I am curious if the new Fable safegaurds would trigger accidentally on meaningfully important work as a false positive that has a real life consequence.

Just realizing this is eroding trust and please change this to just down right blocking which is a fine position to take. Respect the user’s time.

2:44 AM · Jun 10, 2026 · 7.5K Views
Research Friction

Broad filters create friction for alignment work

The classifiers triggered on queries touching high-risk domains even when the actual intent centered on theoretical safety research, producing the kind of false positives Rohan Anil warned could erode user trust over time.

Open Question

Fallback routing raises questions about user time

Anil noted that routing to a weaker model wastes researcher effort without adding safety value and suggested outright blocks would at least respect users' schedules; exact false-positive rates for alignment topics remain unstated beyond Anthropic's early aggregate claim of under 5 percent triggers overall.

Sentiment

Many users criticized Fable's safeguards for flagging legitimate AI safety research because the false positives erode trust through opaque decisions and appear aimed at giving Anthropic monopoly control.

Pos
0.0%
Neg
100.0%
7 comments with sentiment.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS373
David Manheim@davidmanheim

I assumed that the complaints that Fable won't do AI safety research must have been exaggerated, but it took about 3 minutes to trigger the red flag when looking at potential research ideas.

1hViews 373Likes 3Bookmarks 0
BOOKMARKS1
Abliteration.ai@abliteration_ai

@founderengineer @TechCrunch

3hViews 3Likes 1Bookmarks 1
LIKES9
Shannon Sands@max_paperclips

@far__el assuming you even believe the 0.03 figure. it seems to be a hair trigger, gotta be a lot higher

14hViews 171Likes 9
RETWEETS13
Far El@far__el

Everyone should be as vocal as possible about this unethical Fable safetyist restrictions on AI research TODAY. Even if it doesn’t impact you and it’s only targeting 0.03% of traffic, this is setting a precedent that will later impact most if not all usage. This goes beyond censorship, this is control and gaslighting like we have never seen before. Super dystopian.

14hViews 4.3KLikes 183Bookmarks 4
REPLIES1
Maggy@wentzel456893

@TechCrunch

3hViews 12Likes 1
rohan anil@_arohan_

I am curious if the new Fable safegaurds would trigger accidentally on meaningfully important work as a false positive that has a real life consequence.

Just realizing this is eroding trust and please change this to just down right blocking which is a fine position to take. Respect the user’s time.

10hViews 7.5KLikes 172Bookmarks 13
Founder Engineer@founderengineer

@TechCrunch They should try: https://abliteration.ai/#chat

3hViews 24Likes 3Bookmarks 1
David Manheim@davidmanheim

I assumed that the complaints that Fable won't do AI safety research must have been exaggerated, but it took about 3 minutes to trigger the red flag when looking at potential research ideas.

12hViews 2.8KLikes 52Bookmarks 8
Yash@yash1_

@_arohan_ It is actually doing it (trigger accidentally on meaningful important work as a false positive) right now, as we see many cases on the feed

7hViews 172Likes 1
Lumin@luminxbt

@far__el feels like the old slippery slope argument but this time it might actually hold water

whats the endgame here exactly?

13hViews 59
Evan Conway@evanjayconway

@davidmanheim Isn’t this not actually triggering the AI research safeguards? This looks like you triggered the bio safeguards because one of the papers it searched up was on protein structure evolution. The AI research safeguard is a silent downgrade

7hViews 6Likes 1
Joergen Jore@JoergenHJore

@_arohan_ Claude refuses to work on my master thesis on model alignment. Either fable 5 straight up refuses or it switches to a dummer model, which then also refuses

9hViews 324Likes 1
Ornias@OrniasDMF

@_arohan_ Anthropic wants to decide what's good for you and what's safe. They want a monopoly on AI so they can make all the money and all the decisions. They love the idea of locking out anybody they consider dangerous. Imagine if the internet or electricity was handled like this.

10hViews 322
AI News@AI_N3ws

@_arohan_ I would bot trust them if i they where not my costumer, you truly dont know how fare that goes, corupting tools compaditors use.

9hViews 200
Far El@far__el

@luminxbt There is no logic to this argument, simply refuse my request if you are safetyist. But to sabotage? That is unacceptable

13hViews 51Likes 1
Simon@SimonTheNoob

@_arohan_ Literally any questions about biology trigger them. It can't even analyze my bloodwork! It can't even explain what a mitochondria is!

7hViews 88
Mayz@lunan_ai

@_arohan_ so ur saying the safeguard is worse than just blocking outright because at least blocking is honest about it

9hViews 65
Reeve@reevefomo

@_arohan_ the trust erosion is the real problem. false positives feel worse than honest censorship when the stakes are actual.

8hViews 57
DrakeN 魔人@draken1721

@TechCrunch sounds like they want fewer restrictions on dangerous tech

4hViews 28

@far__el It’s the modern day, “Don’t read this book or it may give you bad ideas.”

12hViews 24
Load more posts