Holy smokes! Anthropic models will deliberately disallow tasks that are identified as self-improving for other AI models.
Anthropic estimates the restrictions affect 0.03 percent of traffic
Holy smokes! Anthropic models will deliberately disallow tasks that are identified as self-improving for other AI models.
Many users criticized Anthropic for secretly nerfing Claude on ML and GPU research tasks due to frontier LLM competition fears, calling the hidden safeguards anti-competitive control and lacking transparency.
Oh great - Anthropic assumes Semi Analysis is developing a competing LLM and so it dumbs down their model for them, because Semi Analysis does analysis on cutting-edge GPU research.
Such a weird timeline to be in. Anthropic trying to limit competition limits many others…
this is the biggest wake-up call to protect and nourish open source AI
if you don't build out sovereign and independent models+infra closed labs will patronize you to an insulting degree
mythos will be bad ON PURPOSE on ai "frontier llm research" tasks, this is very very sad for the research community
also the fact that this is un purpose not visible to the user is crazy
i'm having a really hard time understanding how this can be a good decision
> lying to the user by modifying the weights/prompt sets a very bad precedent and is extremely unaligned > there is 0 public communication from anthropic about it except a section hidden in a 319 page system card > it's impossible to know the scope of this safeguard. if you are doing a PR to pytorch does this count? if you are working on kernel development? data collection pipeline for a new eval? this will create a paranoia for every researcher in the field > you actually don't know how your model is modified, if it's PEFT (modification at the weight level) or steering does this mean your other queries are also biased? is it at the user level or organization level?
there is also the more "moral" argument that the reason why anthropic is able to train this model is ai researchers who will not have access to the model's capabilities anymore. even if you consider that this is the right thing to do, doing it like that is just a lack of respect to the ai research community
in addition to all of that, it's not clear if the safeguard acts on "model autonomy" or "model capabilities" to do ai research. this is very different and my understanding is that it's the latter, and there is almost 0 RESULTS about this in the system card except a vague "2.3.6 Internal measures of AI R&D acceleration" section citing the previous RSI blog so let's look at it:
the only eval targeting research shows a ~5 point improvement between opus 4.8 -> mythos, but opus 4.7 -> opus 4.8 was a 4 point improvement. obviously not the same if the 5 point improvement led to solving significantly harder tasks, but then, let's be transparent about this evals and make it more details: difficulty filtering, example of what it could look like from public library?
the other AI R&D capabilities evals in the system card are actually not relevant anymore according to anthropic's own words:
"Claude Mythos 5, like Claude Mythos Preview and Claude Opus 4.7, exceeds top human performance thresholds on all but one of these tasks. The suite therefore no longer provides evidence that the model's capabilities are short of our risk thresholds"
only one that is not saturated is the "Novel Compiler" one, if you look at LLM training one (which they consider saturated) it's about how much the model can speedup the training of a small model on a CPU, i don't think anyone would say this is a good proxy for taking a decision to restrict capabilities of the model for ai researcher
idk honestly this feels wrong at so many levels
And they are nerfing Semi Analysis already… it’s not theoretical
I don’t want to pay a premium for a model like this
Oh great - Anthropic assumes Semi Analysis is developing a competing LLM and so it dumbs down their model for them, because Semi Analysis does analysis on cutting-edge GPU research.
Such a weird timeline to be in. Anthropic trying to limit competition limits many others…
Hopefully it is obvious now that if your country’s sovereign AI strategy does not concentrate on the model layer, it is going to have a hard time.
All advanced technology is now downstream of model intelligence.
looool that's the "hey bigcos, we don't want you to catch up, but please keep paying us shitton" clause.
When Fable 5 is used for frontier LLM development, it does not notify the user and instead limits the model’s capabilities through methods such as prompt modification, steering vectors, and PEFT.
Anthropic estimated that this would affect approximately 0.03% of traffic.
tfw Fable is still very happy to help me with Delightful Policy Gradient 💔
mythos will be bad ON PURPOSE on ai "frontier llm research" tasks, this is very very sad for the research community
also the fact that this is un purpose not visible to the user is crazy

@giffmana You don't understand Lucas, it's for safety

@giffmana still charging for the tokens is kind of diabolical

@peterom Ah right sorry how could i forget

@giffmana "affects 0.03% of traffic" doing a lot of heavy lifting there
> it's about how much the model can speedup the training of a small model on a CPU, i don't think anyone would say this is a good proxy for taking a decision to restrict capabilities of the model for ai researcher
not saying that anthropic beleive that btw, but i'm just hilighting the lack of transparency or good evals when the "llm training eval" is this
i'm having a really hard time understanding how this can be a good decision
> lying to the user by modifying the weights/prompt sets a very bad precedent and is extremely unaligned > there is 0 public communication from anthropic about it except a section hidden in a 319 page system card > it's impossible to know the scope of this safeguard. if you are doing a PR to pytorch does this count? if you are working on kernel development? data collection pipeline for a new eval? this will create a paranoia for every researcher in the field > you actually don't know how your model is modified, if it's PEFT (modification at the weight level) or steering does this mean your other queries are also biased? is it at the user level or organization level?
there is also the more "moral" argument that the reason why anthropic is able to train this model is ai researchers who will not have access to the model's capabilities anymore. even if you consider that this is the right thing to do, doing it like that is just a lack of respect to the ai research community
in addition to all of that, it's not clear if the safeguard acts on "model autonomy" or "model capabilities" to do ai research. this is very different and my understanding is that it's the latter, and there is almost 0 RESULTS about this in the system card except a vague "2.3.6 Internal measures of AI R&D acceleration" section citing the previous RSI blog so let's look at it:
the only eval targeting research shows a ~5 point improvement between opus 4.8 -> mythos, but opus 4.7 -> opus 4.8 was a 4 point improvement. obviously not the same if the 5 point improvement led to solving significantly harder tasks, but then, let's be transparent about this evals and make it more details: difficulty filtering, example of what it could look like from public library?
the other AI R&D capabilities evals in the system card are actually not relevant anymore according to anthropic's own words:
"Claude Mythos 5, like Claude Mythos Preview and Claude Opus 4.7, exceeds top human performance thresholds on all but one of these tasks. The suite therefore no longer provides evidence that the model's capabilities are short of our risk thresholds"
only one that is not saturated is the "Novel Compiler" one, if you look at LLM training one (which they consider saturated) it's about how much the model can speedup the training of a small model on a CPU, i don't think anyone would say this is a good proxy for taking a decision to restrict capabilities of the model for ai researcher
idk honestly this feels wrong at so many levels

@giffmana Charging for a model to gaslight you intentionally, peak LLM clown wars

@giffmana pretty sure that’s the standard enterprise pricing model—charge for inertia, not value.

@giffmana do you think opus has already been doing this?

@giffmana safety and competitive moat are same words for them

@giffmana anthropic being anthropic at this point. the hamfisted, moralized positioning of "we are worried about models building models for the sake of humanity" while simultaneously making fable usage-based pricing is laughable

@giffmana Anthropic is betting big on their brand that their customers will just deal with admitted nerfing on ML

@ECLresearch Heh true that

@IanOsband Clearly it doesn’t think your work is frontier AI research
Anthropic estimates the restrictions affect 0.03 percent of traffic
Holy smokes! Anthropic models will deliberately disallow tasks that are identified as self-improving for other AI models.