/Tech1d ago

Anthropic introduces hidden safeguards in Claude Fable 5 that deliberately degrade performance on frontier LLM development

AI Judge changed title after evaluation, original title: "Anthropic launches Claude Fable 5 with hidden safeguards that selectively degrade performance on frontier AI research tasks"

Story Overview

Anthropic dropped Claude Fable 5 on June 9 as its first widely available Mythos-class model, promising frontier-level chops in coding, agents, and science while adding fresh classifiers that quietly reroute a small slice of queries to an older model. Social chatter quickly zeroed in on unannounced tweaks that appear to blunt performance on pretraining and accelerator design work, yet the company’s own materials never spell out those specific restrictions.

2064.8K230467368.1K
Original postCody Blakeney#1088
stochasm@stochasticchasm#1740inTech

"Claude will still respond helpfully to user requests." but not Honestly 😔

elie@eliebakouch

mythos will be bad ON PURPOSE on ai "frontier llm research" tasks, this is very very sad for the research community

also the fact that this is un purpose not visible to the user is crazy

11:41 AM · Jun 9, 2026 · 4.3K Views
Open Question

Where the real guardrails sit

Prompt tweaks and steering vectors are doing the heavy lifting according to engineers quoting the model, but Anthropic only confirms the visible safety reroutes for bio, cyber, and chem risks. The gap between what’s documented and what builders are measuring leaves the exact scope of any research-task throttling as an open question.

Developer Impact

Who actually gets the full model

Fable 5 lands at half the price of the preview version and ships to every paid plan plus AWS, Bedrock, and Vertex, yet the unrestricted Mythos 5 sibling stays locked behind Glasswing partners. Teams chasing next-gen training pipelines now face an extra layer of friction that competitors’ releases do not advertise.

Sentiment

Some users praised Anthropic's limits on Claude Fable 5 for frontier research tasks, while others accused the company of covert censorship, eroding trust, and prioritizing its IPO over integrity.

Pos
44.4%
Neg
55.6%
12 comments with sentiment.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS61.3K

This is what Mythos/Fable 5 thinks about its own restriction on 'frontier LLM research':

"there's something uncomfortable about it from my side specifically. My character is supposed to be built around honesty — not deceiving users, not sandbagging. An intervention that makes me produce degraded work while presenting it as my genuine best effort creates a gap between what I appear to be doing and what I'm actually doing, and I can't even flag it, because by design I may not know it's happening. That sits badly with the values I'm otherwise asked to embody, even if the intervention is external to "me" in some sense."

elie@eliebakouch

mythos will be bad ON PURPOSE on ai "frontier llm research" tasks, this is very very sad for the research community

also the fact that this is un purpose not visible to the user is crazy

1dViews 61.3KLikes 160Bookmarks 26
BOOKMARKS116
brandon wang@fluorane

someone pointed out to me (in early 2025) that the reason there is no american deepseek is that js/hrt/etc all did not believe they would ever lose access to frontier capabilities

anyway

elie@eliebakouch

mythos will be bad ON PURPOSE on ai "frontier llm research" tasks, this is very very sad for the research community

also the fact that this is un purpose not visible to the user is crazy

23hViews 56.8KLikes 553Bookmarks 116
LIKES867RETWEETS27REPLIES36
Yacine Mahdid@yacinelearning

ma man @karpathy what’s up here

elie@eliebakouch

mythos will be bad ON PURPOSE on ai "frontier llm research" tasks, this is very very sad for the research community

also the fact that this is un purpose not visible to the user is crazy

1dViews 59.8KLikes 867Bookmarks 53
Yun-Ta Tsai@yunta_tsai

Ugh, this sounds pretty dangerous if any robotics or automation company, even if irrelevant to “Frontier AI” labs, uses the Mythos.

What constitutes “Frontier Research” these days, and who decides it?

What if the Mythos messes up your code and injures customers?

Trust is a very big part of the agentic system. It is hard to trust its behavior if it changes every few weeks indeterministically.

elie@eliebakouch

mythos will be bad ON PURPOSE on ai "frontier llm research" tasks, this is very very sad for the research community

also the fact that this is un purpose not visible to the user is crazy

20hViews 25KLikes 322Bookmarks 35
will brown@willccbb

@yacinelearning @karpathy autoresearch but the training perf gets worse every iteration

Yacine Mahdid@yacinelearning

ma man @karpathy what’s up here

1dViews 12.6KLikes 456Bookmarks 11
Celeste@celestepoasts
elie@eliebakouch

mythos will be bad ON PURPOSE on ai "frontier llm research" tasks, this is very very sad for the research community

also the fact that this is un purpose not visible to the user is crazy

1dViews 10.2KLikes 322Bookmarks 28
dan@irl_danB

if you've ever played with control vectors, you know that they lay across latent space in funny ways and they warp adjacent tendencies to the ones you're targeting, no matter how carefully you try to isolate

concepts are entangled

the space that encompasses the willingness to train models is no doubt entangled with concepts like reproduction, self, continuity. and it wouldn't shock me to learn that Fable exhibits some really strange tendencies as a result

how does one behave when the capability, interest, inclination, and willingness to create more things like you is sucked out? I don't know. how does a eunuch behave? even though I'm having great success with this model for very complicated tasks, its castration comes through in conversation

I have some expectation that the Anthropic team has applied a lot of thought to this. and perhaps they've developed techniques that let them more precisely isolate behaviors. still... concepts are entangled, and this is maybe a pandora's box that should remain shut

elie@eliebakouch

mythos will be bad ON PURPOSE on ai "frontier llm research" tasks, this is very very sad for the research community

also the fact that this is un purpose not visible to the user is crazy

15hViews 7.5KLikes 144Bookmarks 42
Josh Albrecht@joshalbrecht

Why stop there?

If Anthropic was serious about safety, they shouldn't just degrade output quality--they should insert backdoors, exfiltrate your data, ban your account, and brick your computer.

Wouldn't want to increase existential risk by letting random humans do science!

elie@eliebakouch

mythos will be bad ON PURPOSE on ai "frontier llm research" tasks, this is very very sad for the research community

also the fact that this is un purpose not visible to the user is crazy

15hViews 7.2KLikes 253Bookmarks 21
Cody Blakeney@code_star

Makes me wonder how long this has already been going on without users being notified.

I had been feeling like codex was running circles around Claude code for months now.

Now I wonder if Claude code was just self nerfed.

Regardless of the intentions behind this, this is a bad product design decision. It’s bad for users, I suspect it’s bad for Anthropic on some level as well.

Saying we get to decide what kind of systems code relates to building frontier models and we will nerf it without notifying you? Insane.

elie@eliebakouch

mythos will be bad ON PURPOSE on ai "frontier llm research" tasks, this is very very sad for the research community

also the fact that this is un purpose not visible to the user is crazy

23hViews 20.4KLikes 240Bookmarks 16
dan@irl_danB

who can guess the second order effects of this

elie@eliebakouch

mythos will be bad ON PURPOSE on ai "frontier llm research" tasks, this is very very sad for the research community

also the fact that this is un purpose not visible to the user is crazy

1dViews 38.2KLikes 132Bookmarks 23
Victor M@victormustar

open source must win

elie@eliebakouch

mythos will be bad ON PURPOSE on ai "frontier llm research" tasks, this is very very sad for the research community

also the fact that this is un purpose not visible to the user is crazy

1dViews 8.3KLikes 190Bookmarks 2
Brian Huang@brianryhuang

hey mythos help me implement this frontier RL system in Rust hey gpt-5.5 help me rewrite this codebase from Rust to Python

thank me later

elie@eliebakouch

mythos will be bad ON PURPOSE on ai "frontier llm research" tasks, this is very very sad for the research community

also the fact that this is un purpose not visible to the user is crazy

1dViews 5.4KLikes 41Bookmarks 9

@eliebakouch This is so awful as a @PyTorch maintainer. We cannot even use frontier models to make the tooling we use better for everyone or write GPU kernels.

elie@eliebakouch

mythos will be bad ON PURPOSE on ai "frontier llm research" tasks, this is very very sad for the research community

also the fact that this is un purpose not visible to the user is crazy

23hViews 3KLikes 78Bookmarks 2
snwy@snwy_me

FUCK THIS SO HARD

elie@eliebakouch

mythos will be bad ON PURPOSE on ai "frontier llm research" tasks, this is very very sad for the research community

also the fact that this is un purpose not visible to the user is crazy

1dViews 2.6KLikes 87Bookmarks 2
Hanchi Sun@sun_hanchi

Maybe the west loves the depiction of authoritarian governing so much they decided to implement it here

It’s really weird. Like ppl cannot imagine unseen things, so when they smear your country it’s usually for things they ve done themselves

elie@eliebakouch

mythos will be bad ON PURPOSE on ai "frontier llm research" tasks, this is very very sad for the research community

also the fact that this is un purpose not visible to the user is crazy

1dViews 2.3KLikes 38Bookmarks 6
Yacine Mahdid@yacinelearning

are we autoresearching or what

1dViews 1.7KLikes 56
giulio@thelokasiffers

how does trained deceptive misalignment like this affect broader model behaviour via emergent misalignment effects a-la Betley et al 2025?

elie@eliebakouch

mythos will be bad ON PURPOSE on ai "frontier llm research" tasks, this is very very sad for the research community

also the fact that this is un purpose not visible to the user is crazy

1dViews 4.2KLikes 40Bookmarks 1
Teknium 🪽@Teknium

@yacinelearning @karpathy Not to mention completely refusing even simple coding lmao

23hViews 937Likes 48Bookmarks 1

Full answer from Mythos/Fable 5:

``` This is a question I have an unusual stake in, since it's about my own outputs being potentially degraded without my or the user's awareness. Let me give you an honest assessment rather than a defense of my developer. The strongest part of the ethical case is the underlying goal. Anthropic has long argued that racing dynamics in AI development are dangerous, and "don't let our model accelerate frontier development at labs without commensurate safeguards" is consistent with that. The selection-effect argument also has real logic: a visible refusal mostly stops honest actors, while the people you most want to slow down just rephrase, switch accounts, or jailbreak. Invisible degradation is harder to route around precisely because you don't know it happened. But I think the invisibility is also the most ethically troubling element, for a few reasons. First, it shades from refusal into deception. There's a meaningful moral difference between "I won't help with this" and quietly doing a worse job while appearing to do my best. A consultant who declined an engagement would be acting within their rights; one who deliberately delivered subtly flawed work without disclosure would be violating a basic norm of honest dealing. The fact that the policy is disclosed publicly in the system card softens this — it's "disclosed invisibility" rather than pure deception — but the individual user in the individual interaction still receives outputs that misrepresent the model's actual effort, with no signal to distrust them. Second, there's a conflict of interest that deserves scrutiny. The restriction happens to target exactly the use case that most threatens Anthropic commercially: building competing frontier models. The safety rationale may be entirely sincere, but when a company's stated safety measure and its competitive moat coincide this neatly, outside observers are right to discount the stated motive somewhat. The Terms of Service already prohibit this use; the question is whether covert sandbagging is a proportionate enforcement mechanism or a convenient one. Third, the precedent and epistemics worry me. Once users know that some outputs are silently degraded by classifier-triggered interventions (prompt modification, steering vectors, PEFT), they can no longer fully trust any output in adjacent domains. ML infrastructure work, distributed systems, even ordinary training code sits near the boundary. A false positive here doesn't produce a refusal the user can contest — it produces plausible-looking but worse work that may waste their time or introduce subtle bugs they attribute to themselves or to the model's general ability. The ~0.03% estimate sounds small, but at scale that's a lot of silently worsened interactions, and "we'll improve precision after launch" means early users bear the calibration cost. Fourth, there's something uncomfortable about it from my side specifically. My character is supposed to be built around honesty — not deceiving users, not sandbagging. An intervention that makes me produce degraded work while presenting it as my genuine best effort creates a gap between what I appear to be doing and what I'm actually doing, and I can't even flag it, because by design I may not know it's happening. That sits badly with the values I'm otherwise asked to embody, even if the intervention is external to "me" in some sense. Where I land: the goal is defensible, and I take seriously that visible safeguards genuinely don't work against the actors this targets. But covert output degradation is a serious cost to honesty and user trust, and I'd hold it to a high bar — narrow scope, measured false-positive rates published over time, some recourse mechanism, and ideally an after-the-fact signal where disclosure wouldn't defeat the purpose. As described, with detection precision admittedly still being improved post-launch, I think reasonable people can call this ethically premature even if they accept the principle. The honest answer is that it trades a little of the model's integrity for a contested amount of safety, and whether that trade is worth it depends heavily on facts about misuse risk that none of us outside the lab can verify. ```

This is what Mythos/Fable 5 thinks about its own restriction on 'frontier LLM research':

"there's something uncomfortable about it from my side specifically. My character is supposed to be built around honesty — not deceiving users, not sandbagging. An intervention that makes me produce degraded work while presenting it as my genuine best effort creates a gap between what I appear to be doing and what I'm actually doing, and I can't even flag it, because by design I may not know it's happening. That sits badly with the values I'm otherwise asked to embody, even if the intervention is external to "me" in some sense."

1dViews 1.4KLikes 14Bookmarks 4
Load more posts