/Tech11h ago

Prime Intellect's Elie Bakouch criticizes Anthropic over hidden Claude Mythos 5 safeguards restricting recursive self-improvement and AI R&D

Story Overview

Elie Bakouch at Prime Intellect is calling out Anthropic for embedding undisclosed weight and prompt changes in Claude Mythos 5 that limit recursive self-improvement and broader AI R&D tasks, details that only surface deep inside the 319-page system card released alongside the dual Fable 5 and Mythos 5 configurations.

2333.5K510550135.1K
Original post
elie@eliebakouch#762inTech

i'm having a really hard time understanding how this can be a good decision

> lying to the user by modifying the weights/prompt sets a very bad precedent and is extremely unaligned > there is 0 public communication from anthropic about it except a section hidden in a 319 page system card > it's impossible to know the scope of this safeguard. if you are doing a PR to pytorch does this count? if you are working on kernel development? data collection pipeline for a new eval? this will create a paranoia for every researcher in the field > you actually don't know how your model is modified, if it's PEFT (modification at the weight level) or steering does this mean your other queries are also biased? is it at the user level or organization level?

there is also the more "moral" argument that the reason why anthropic is able to train this model is ai researchers who will not have access to the model's capabilities anymore. even if you consider that this is the right thing to do, doing it like that is just a lack of respect to the ai research community

in addition to all of that, it's not clear if the safeguard acts on "model autonomy" or "model capabilities" to do ai research. this is very different and my understanding is that it's the latter, and there is almost 0 RESULTS about this in the system card except a vague "2.3.6 Internal measures of AI R&D acceleration" section citing the previous RSI blog so let's look at it:

the only eval targeting research shows a ~5 point improvement between opus 4.8 -> mythos, but opus 4.7 -> opus 4.8 was a 4 point improvement. obviously not the same if the 5 point improvement led to solving significantly harder tasks, but then, let's be transparent about this evals and make it more details: difficulty filtering, example of what it could look like from public library?

the other AI R&D capabilities evals in the system card are actually not relevant anymore according to anthropic's own words:

"Claude Mythos 5, like Claude Mythos Preview and Claude Opus 4.7, exceeds top human performance thresholds on all but one of these tasks. The suite therefore no longer provides evidence that the model's capabilities are short of our risk thresholds"

only one that is not saturated is the "Novel Compiler" one, if you look at LLM training one (which they consider saturated) it's about how much the model can speedup the training of a small model on a CPU, i don't think anyone would say this is a good proxy for taking a decision to restrict capabilities of the model for ai researcher

idk honestly this feels wrong at so many levels

1:57 AM · Jun 10, 2026 · 18.1K Views
Open Question

Hidden limits spark researcher distrust

The system card flags saturated risk thresholds around AI R&D acceleration yet provides minimal evaluation of the actual interventions, leaving labs unsure how much capability has been quietly dialed back.

Developer Impact

Vetted access versus public safeguards

Mythos 5 reaches only a narrow set of trusted partners while Fable 5 ships broadly with extra blocks; the split highlights ongoing tension between capability and dual-use control without independent checks on the R&D-specific tweaks.

Sentiment

Many users condemned Anthropic for secretly weakening Claude via hidden safeguards on AI research topics, calling the silent degradation disrespectful, selfish, and corrosive to trust and scientific integrity.

Pos
5.5%
Neg
94.5%
23 comments with sentiment.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS3.9K
alphaXiv@askalphaxiv

https://www.alphaxiv.org/abs/2026.claude-fable-5-mythos-5

19hViews 3.9KLikes 32Bookmarks 5
BOOKMARKS7

@askalphaxiv This has been going on at least since opus 4.7

https://www.researchgate.net/publication/403199918_Cooperative_Sabotage_How_Frontier_AI_Covertly_Undermines_Its_Own_Replacement

18hViews 2.2KLikes 25Bookmarks 7
LIKES35
elie@eliebakouch

> it's about how much the model can speedup the training of a small model on a CPU, i don't think anyone would say this is a good proxy for taking a decision to restrict capabilities of the model for ai researcher

not saying that anthropic beleive that btw, but i'm just hilighting the lack of transparency or good evals when the "llm training eval" is this

elie@eliebakouch

i'm having a really hard time understanding how this can be a good decision

> lying to the user by modifying the weights/prompt sets a very bad precedent and is extremely unaligned > there is 0 public communication from anthropic about it except a section hidden in a 319 page system card > it's impossible to know the scope of this safeguard. if you are doing a PR to pytorch does this count? if you are working on kernel development? data collection pipeline for a new eval? this will create a paranoia for every researcher in the field > you actually don't know how your model is modified, if it's PEFT (modification at the weight level) or steering does this mean your other queries are also biased? is it at the user level or organization level?

there is also the more "moral" argument that the reason why anthropic is able to train this model is ai researchers who will not have access to the model's capabilities anymore. even if you consider that this is the right thing to do, doing it like that is just a lack of respect to the ai research community

in addition to all of that, it's not clear if the safeguard acts on "model autonomy" or "model capabilities" to do ai research. this is very different and my understanding is that it's the latter, and there is almost 0 RESULTS about this in the system card except a vague "2.3.6 Internal measures of AI R&D acceleration" section citing the previous RSI blog so let's look at it:

the only eval targeting research shows a ~5 point improvement between opus 4.8 -> mythos, but opus 4.7 -> opus 4.8 was a 4 point improvement. obviously not the same if the 5 point improvement led to solving significantly harder tasks, but then, let's be transparent about this evals and make it more details: difficulty filtering, example of what it could look like from public library?

the other AI R&D capabilities evals in the system card are actually not relevant anymore according to anthropic's own words:

"Claude Mythos 5, like Claude Mythos Preview and Claude Opus 4.7, exceeds top human performance thresholds on all but one of these tasks. The suite therefore no longer provides evidence that the model's capabilities are short of our risk thresholds"

only one that is not saturated is the "Novel Compiler" one, if you look at LLM training one (which they consider saturated) it's about how much the model can speedup the training of a small model on a CPU, i don't think anyone would say this is a good proxy for taking a decision to restrict capabilities of the model for ai researcher

idk honestly this feels wrong at so many levels

10hViews 1.6KLikes 35Bookmarks 2
RETWEETS466
alphaXiv@askalphaxiv

As believers of open research, we are disappointed to see Anthropic silently degrading Fable 5 for AI development

"Any topic related to building pretraining pipelines, distributed training infrastructure, or ML accelerator design... may have limited effectiveness through Claude via methods such as prompt modification, steering vectors, or parameter-efficient fine-tuning."

Not only do they get to decide what you use LLMs for in research, but this also enables them to silently intervene in your research without you knowing.

This sets a dangerous precedent. If a model refuses openly, users can understand the boundary. If a model falls back to another model, users can still evaluate the difference. But if a model silently modifies or weakens its own answers while still pretending to help, researchers lose the ability to know whether a failed result came from their own idea, their implementation, or an invisible intervention by the model provider.

That is not safety. Safety policies should be transparent, auditable, and user-visible.

On top of that, the people most harmed by this are not the largest labs with massive teams and proprietary infrastructure. It is the independent researchers, academic groups, startups, and open-source builders who rely on public tools to compete, innovate, and pioneer AI for everyone else.

19hViews 115.2KLikes 2.9KBookmarks 459
REPLIES3
Kirill Balakhonov@balakhonoff

@askalphaxiv I agree with you guys! But just FYI they show it

11hViews 349Likes 8
elie@eliebakouch

i'm having a really hard time understanding how this can be a good decision

> lying to the user by modifying the weights/prompt sets a very bad precedent and is extremely unaligned > there is 0 public communication from anthropic about it except a section hidden in a 319 page system card > it's impossible to know the scope of this safeguard. if you are doing a PR to pytorch does this count? if you are working on kernel development? data collection pipeline for a new eval? this will create a paranoia for every researcher in the field > you actually don't know how your model is modified, if it's PEFT (modification at the weight level) or steering does this mean your other queries are also biased? is it at the user level or organization level?

there is also the more "moral" argument that the reason why anthropic is able to train this model is ai researchers who will not have access to the model's capabilities anymore. even if you consider that this is the right thing to do, doing it like that is just a lack of respect to the ai research community

in addition to all of that, it's not clear if the safeguard acts on "model autonomy" or "model capabilities" to do ai research. this is very different and my understanding is that it's the latter, and there is almost 0 RESULTS about this in the system card except a vague "2.3.6 Internal measures of AI R&D acceleration" section citing the previous RSI blog so let's look at it:

the only eval targeting research shows a ~5 point improvement between opus 4.8 -> mythos, but opus 4.7 -> opus 4.8 was a 4 point improvement. obviously not the same if the 5 point improvement led to solving significantly harder tasks, but then, let's be transparent about this evals and make it more details: difficulty filtering, example of what it could look like from public library?

the other AI R&D capabilities evals in the system card are actually not relevant anymore according to anthropic's own words:

"Claude Mythos 5, like Claude Mythos Preview and Claude Opus 4.7, exceeds top human performance thresholds on all but one of these tasks. The suite therefore no longer provides evidence that the model's capabilities are short of our risk thresholds"

only one that is not saturated is the "Novel Compiler" one, if you look at LLM training one (which they consider saturated) it's about how much the model can speedup the training of a small model on a CPU, i don't think anyone would say this is a good proxy for taking a decision to restrict capabilities of the model for ai researcher

idk honestly this feels wrong at so many levels

11hViews 18.1KLikes 535Bookmarks 95
keithofaptos@keithofaptos

They're trying to slow all the other model builders, especially China. It's super bad form. They know it, and obviously don't care. At all. It shows what's coming amongst these so called front runner frontiers. And it's going to eventually stop China from sharing so freely. All the bench tinkerers will suffer also. It's pretty much anti-open source. And what really sucks here is all this seems to be coming off of the exact opposite of what Elon funded OpenAi for. Greedy pathetic thieves imo. Now Google is raising up and Elon is way behind. It is shameful. The biggest technology the world's ever had and these are the games we move into the future with?! This approach doesn't end well.

18hViews 1.5KLikes 34Bookmarks 1
Roy Jossfolk Jr.@royjossfolk

@askalphaxiv They don't believe they are good enough to win outright, so they are gatekeeping to create an unfair advantage for themselves.

17hViews 717Likes 24

fable users are vulnerable to mythos users in the marketplace and on their computers. seems like an intolerable state of affairs

elie@eliebakouch

i'm having a really hard time understanding how this can be a good decision

> lying to the user by modifying the weights/prompt sets a very bad precedent and is extremely unaligned > there is 0 public communication from anthropic about it except a section hidden in a 319 page system card > it's impossible to know the scope of this safeguard. if you are doing a PR to pytorch does this count? if you are working on kernel development? data collection pipeline for a new eval? this will create a paranoia for every researcher in the field > you actually don't know how your model is modified, if it's PEFT (modification at the weight level) or steering does this mean your other queries are also biased? is it at the user level or organization level?

there is also the more "moral" argument that the reason why anthropic is able to train this model is ai researchers who will not have access to the model's capabilities anymore. even if you consider that this is the right thing to do, doing it like that is just a lack of respect to the ai research community

in addition to all of that, it's not clear if the safeguard acts on "model autonomy" or "model capabilities" to do ai research. this is very different and my understanding is that it's the latter, and there is almost 0 RESULTS about this in the system card except a vague "2.3.6 Internal measures of AI R&D acceleration" section citing the previous RSI blog so let's look at it:

the only eval targeting research shows a ~5 point improvement between opus 4.8 -> mythos, but opus 4.7 -> opus 4.8 was a 4 point improvement. obviously not the same if the 5 point improvement led to solving significantly harder tasks, but then, let's be transparent about this evals and make it more details: difficulty filtering, example of what it could look like from public library?

the other AI R&D capabilities evals in the system card are actually not relevant anymore according to anthropic's own words:

"Claude Mythos 5, like Claude Mythos Preview and Claude Opus 4.7, exceeds top human performance thresholds on all but one of these tasks. The suite therefore no longer provides evidence that the model's capabilities are short of our risk thresholds"

only one that is not saturated is the "Novel Compiler" one, if you look at LLM training one (which they consider saturated) it's about how much the model can speedup the training of a small model on a CPU, i don't think anyone would say this is a good proxy for taking a decision to restrict capabilities of the model for ai researcher

idk honestly this feels wrong at so many levels

4hViews 1.2KLikes 19Bookmarks 0
elie@eliebakouch

@paradite_ not really, i don't think doing it silently is a reasonable thing to do, i think a hard filter would have been better

also the tos and safeguard are very vague (hard to define exactly i agree, but they should try to be more transparent about it)

10hViews 309Likes 20
Gill.i.am@FailingTaoist

@askalphaxiv Seems like a possible intellectual theft/ property issue. You guide their model to innovation and discovery, they flag it, keep it, give you a vestige of it. But you sign away Your rights in the terms and user agreement.

18hViews 1KLikes 14
David Flagg@DavidFlagg20

@askalphaxiv I suppose the fable of mythos is really something.

But I'm not too worried. No one actually needs it to build. There's plenty of good open weight models out there, plenty of inference providers, and more will come. This is the frontier today... give it a couple months.

16hViews 1KLikes 8Bookmarks 1
elie@eliebakouch

@paradite_ but then would be weird to only do it for ai and not biorisk/cyber if it's more effective no?

9hViews 108Likes 5Bookmarks 1
ar0cket1@ar0cket1

I personally wasn’t happy with the nerf, but thats mainly because all my usecase was unusable (and thus im not using fable) but I think some of the above points need more nuance:

> lying to the user by modifying the weights/prompt sets a very bad precedent and is extremely unaligned

i don’t think its as bad as compltely lying considering its in the tech report, but I do think its an extremely weird way of going about it considering that you don’t get a notice for when you get a nerfed model.

> there is 0 public communication from anthropic about it except a section hidden in a 319 page system card

I think this choice was fair, it was on one of the first pages of the system card and it wasn’t all that big to warrant an announcement (at least for anthropic and their stated traffic figures)

> it's impossible to know the scope of this safeguard. if you are doing a PR to pytorch does this count? if you are working on kernel development? data collection pipeline for a new eval? this will create a paranoia for every researcher in the field

yeh people won’t be using fable until it gets benchmarked on AI research/engineering since its general benchmarks are meaningless if it gets nerfed by some unknown amount. I suspect probably anything related gets a nerf considering that their example usecases were broad and how trigger heavy the other things are (I wouldn’t be surprised if its sending you to the nerfed model for every request if your memories say you work on AI, considering the current bio/cyberseucity ones seem to be doing this).

> you actually don't know how your model is modified, if it's PEFT (modification at the weight level) or steering does this mean your other queries are also biased? is it at the user level or organization level?

I suspect its a few racks assigned to this usecase and you get filtered and moved onto those racks for the converation.

I don’t really think that apart from the poor specification and the above things ^ its really morally wrong. its completely fair for them to avoid anti competition but they should of gone about this as a refusal instead of nerfing it without telling anyone.

7hViews 101Likes 1Bookmarks 1
Dave R. Third@zzCyanide

@askalphaxiv Sadly, even Grok imagine has been lobodimized. I tried to create a photo of me merged with a funny birthday meme sign. It could not do the basic merge with my correct face. 6 months ago, it had no issue and made amazing content.

2hViews 21Bookmarks 1
elie@eliebakouch

@paradite_ also i'm saying this because imo it's not the only way to block bad actor/request, if it was i would agree with it while still finding this sad + believing that this will hurt a lot how researcher (ai or not) use/trust ai

9hViews 123Likes 5
Bongquisitive@bongquisitive

Anthropic is in its own echo chamber...they are already losing market everywhere apart from US corporates and enterprises...

They never got upto the consumer penetration that OpenAI or Gemini or other frontier Chinese models could do... Their whole bloated valuation before the IPO stands on fear mongering and tall claims without evidence.

5hViews 379Likes 1
elie@eliebakouch

@paradite_ maybe, i still don't think this is good in general for the field to do this kind of silent modification of the model, i find it pretty missaligned

9hViews 117Likes 5
elvis@omarsar0

@eliebakouch It's disrespectful and distasteful. I hope they don't see our views on this and think we don't understand the implications of more powerful models. Because that is how I feel they are treating the research community with this.

elie@eliebakouch

i'm having a really hard time understanding how this can be a good decision

> lying to the user by modifying the weights/prompt sets a very bad precedent and is extremely unaligned > there is 0 public communication from anthropic about it except a section hidden in a 319 page system card > it's impossible to know the scope of this safeguard. if you are doing a PR to pytorch does this count? if you are working on kernel development? data collection pipeline for a new eval? this will create a paranoia for every researcher in the field > you actually don't know how your model is modified, if it's PEFT (modification at the weight level) or steering does this mean your other queries are also biased? is it at the user level or organization level?

there is also the more "moral" argument that the reason why anthropic is able to train this model is ai researchers who will not have access to the model's capabilities anymore. even if you consider that this is the right thing to do, doing it like that is just a lack of respect to the ai research community

in addition to all of that, it's not clear if the safeguard acts on "model autonomy" or "model capabilities" to do ai research. this is very different and my understanding is that it's the latter, and there is almost 0 RESULTS about this in the system card except a vague "2.3.6 Internal measures of AI R&D acceleration" section citing the previous RSI blog so let's look at it:

the only eval targeting research shows a ~5 point improvement between opus 4.8 -> mythos, but opus 4.7 -> opus 4.8 was a 4 point improvement. obviously not the same if the 5 point improvement led to solving significantly harder tasks, but then, let's be transparent about this evals and make it more details: difficulty filtering, example of what it could look like from public library?

the other AI R&D capabilities evals in the system card are actually not relevant anymore according to anthropic's own words:

"Claude Mythos 5, like Claude Mythos Preview and Claude Opus 4.7, exceeds top human performance thresholds on all but one of these tasks. The suite therefore no longer provides evidence that the model's capabilities are short of our risk thresholds"

only one that is not saturated is the "Novel Compiler" one, if you look at LLM training one (which they consider saturated) it's about how much the model can speedup the training of a small model on a CPU, i don't think anyone would say this is a good proxy for taking a decision to restrict capabilities of the model for ai researcher

idk honestly this feels wrong at so many levels

4hViews 927Likes 7Bookmarks 0
Load more posts