/Tech2h ago

AI2's Nathan Lambert critiques Anthropic's Fable safety filters as uneven, amid debate over restrictive enterprise access to Mythos

Story Overview

Anthropic paired its widely available Fable 5 configuration with a locked-down Mythos 5 variant that shares the same core weights yet drops select safeguards, drawing fire from AI2 researcher Nathan Lambert over safety measures applied unevenly and sometimes without clear notice to users.

39321344722.1K
Original post
Nathan Lambert@natolambert#70inTech

The core part of this Anthropic Fable release saga is that there are many overlapping issues at once. Some of which operate on different timelines of the AI arc, and some have easier fixes. In my critiques, I asked for specific changes to some understanding that some things don't have an easy fix.

The simplest issue was an uneven application of safety domains in a way that was misleading to users. This was an implementation issue that overlaps with a values-based decision of what their customers should be doing. Many people including myself pointed out how it was insane to list core safety areas and then have one of them launch with a different safety mechanism, one which actively mislead users. Doing this from the guise of safety was a major misstep and in my opinion Anthropic got very justifiably raked over the coals for it. Don't release the model if you can't hit your safety targets.

A subissue here is the idea of silent manipulation. This again is a horrible precedent, and quite odd for a company that has done extensive, leading technical AI safety research on ideas like CoT monitoring and other emergent misalignment issues. Silent manipulation of users is baking in a misalignment to the system at its face level. This comes with a permanent degradation in user trust, which begets a less safe environment for AI. Users who don't have clear information on how AI works will not develop safe working patterns with it.

The more complex issues are with how Anthropic handles broader scientific engagement with their models. The safety classifiers launched with these models obviously have accuracy issues to start. I have priced in that there will be more false positives to start, that's life. It's Anthropic's business to degrade their products at release time, or make the trade off of user satisfaction versus revenue. Still, it is a very real sign of concentration of power that businesses can make such obviously user-harmful behaviors and still lead in the market. This concentration of power is only starting to set in and we could see even weirder signs of it in the coming years.

It is now simple enough for me to test Claude Fable in my workflows and know if I'm restricted. This is obviously a suboptimal equilibrium – i want the best intelligence I can get, without restrictions – but it is easy enough for me to make sense of and work with.

The specific issue of restricting access to AI research in particular was a bubbling and hard to fix issue with Anthropic specifically, and the frontier labs generally. There is a common view that the frontier labs will be the mediators of all major scientific innovations in the future, as the places with the best models and the compute for inference to solve major problems. This is a categorical error in how science works, which is a community evolution of accepted ideas, and the the evaluation of your ideas by (hopefully numerous) independent, other practitioners. You cannot have science advance only within a monolith.

As an AI researcher I'm very sad to have the latest models restricted, but I would expect Anthropic to do this eventually. I lost more trust over the silent manipulation than I would with a restriction in access. Anthropic has made it pretty clear that they only trust themselves as the mediators of cutting-edge AI research.

If I had a say, Anthropic should've proactively made a program to make sure researchers get access in the broader AI community without the safeguards. Academics, nonprofit workers myself, etc. have no reason to not get access. The only valid argument here is that they want to control frontier AI, which is a know your customer part of serving these models.

This worldview of science has personally motivated me greatly over the last year, and increasingly so this week, to make the open science of AI continue to be viable. Olmo was a wonderful success here. Still, building research infrastructure is different from working for access to the tools needed to do the trade.

7:36 AM · Jun 11, 2026 · 830 Views
Safety Watch

Inconsistent filters complicate user expectations

Fable 5 routes flagged queries to an older model while leaving most sessions untouched, yet the undisclosed steering in other areas leaves people unsure exactly how the system is shaping their interactions.

Access Debate

Tight Mythos limits spotlight access choices

Only a curated group of roughly 50 organizations can reach the less-restricted version, leaving open how future trusted-access plans might expand or further concentrate advanced capabilities.

Sentiment

Many users criticized Anthropic's Fable release for misleading safety claims, user manipulation, and safety theater that undermined the company's credibility as a safety lab.

Pos
35.0%
Neg
65.0%
15 comments with sentiment.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS14.2KBOOKMARKS35LIKES175RETWEETS23
Nathan Lambert@natolambert

The core part of this Anthropic Fable release saga is that there are many overlapping issues at once. Some of which operate on different timelines of the AI arc, and some have easier fixes. In my critiques, I asked for specific changes to some things, understanding that some things don't have an easy fix.

The simplest issue was an uneven application of safety domains in a way that was misleading to users. This was an implementation issue that overlaps with a values-based decision of what their customers should be doing. Many people including myself pointed out how it was insane to list core safety areas and then have one of them launch with a different safety mechanism, one which actively mislead users. Doing this from the guise of safety was a major misstep and in my opinion Anthropic got very justifiably raked over the coals for it. Don't release the model if you can't hit your safety targets.

A subissue here is the idea of silent manipulation. This again is a horrible precedent, and quite odd for a company that has done extensive, leading technical AI safety research on ideas like CoT monitoring and other emergent misalignment issues. Silent manipulation of users is baking in a misalignment to the system at its face level. This comes with a permanent degradation in user trust, which begets a less safe environment for AI. Users who don't have clear information on how AI works will not develop safe working patterns with it.

The more complex issues are with how Anthropic handles broader scientific engagement with their models. The safety classifiers launched with these models obviously have accuracy issues to start. I have priced in that there will be more false positives to start, that's life. It's Anthropic's business to degrade their products at release time, or make the trade off of user satisfaction versus revenue. Still, it is a very real sign of concentration of power that businesses can make such obviously user-harmful behaviors and still lead in the market. This concentration of power is only starting to set in and we could see even weirder signs of it in the coming years.

It is now simple enough for me to test Claude Fable in my workflows and know if I'm restricted. This is obviously a suboptimal equilibrium – i want the best intelligence I can get, without restrictions – but it is easy enough for me to make sense of and work with.

The specific issue of restricting access to AI research in particular was a bubbling and hard to fix issue with Anthropic specifically, and the frontier labs generally. There is a common view that the frontier labs will be the mediators of all major scientific innovations in the future, as the places with the best models and the compute for inference to solve major problems. This is a categorical error in how science works, which is a community evolution of accepted ideas, and the the evaluation of your ideas by (hopefully numerous) independent, other practitioners. You cannot have science advance only within a monolith.

As an AI researcher I'm very sad to have the latest models restricted, but I would expect Anthropic to do this eventually. I lost more trust over the silent manipulation than I would with a restriction in access. Anthropic has made it pretty clear that they only trust themselves as the mediators of cutting-edge AI research.

If I had a say, Anthropic should've proactively made a program to make sure researchers get access in the broader AI community without the safeguards. Academics, nonprofit workers myself, etc. have no reason to not get access. The only valid argument here is that they want to control frontier AI, which is a know your customer part of serving these models.

This worldview of science has personally motivated me greatly over the last year, and increasingly so this week, to make the open science of AI continue to be viable. Olmo was a wonderful success here. Still, building research infrastructure is different from working for access to the tools needed to do the trade.

2hViews 14.2KLikes 175Bookmarks 35
REPLIES15
Lisan al Gaib@scaling01

Why do individuals not get access to Mythos, but "trusted companies" do? It's like you want to create a permanent underclass.

We have already seen that China bros got access to Mythos and had their reselling APIs, probably exactly through these "trusted companies"

Just require ID verification and give us access to Mythos

Nathan Lambert@natolambert

The core part of this Anthropic Fable release saga is that there are many overlapping issues at once. Some of which operate on different timelines of the AI arc, and some have easier fixes. In my critiques, I asked for specific changes to some things, understanding that some things don't have an easy fix.

The simplest issue was an uneven application of safety domains in a way that was misleading to users. This was an implementation issue that overlaps with a values-based decision of what their customers should be doing. Many people including myself pointed out how it was insane to list core safety areas and then have one of them launch with a different safety mechanism, one which actively mislead users. Doing this from the guise of safety was a major misstep and in my opinion Anthropic got very justifiably raked over the coals for it. Don't release the model if you can't hit your safety targets.

A subissue here is the idea of silent manipulation. This again is a horrible precedent, and quite odd for a company that has done extensive, leading technical AI safety research on ideas like CoT monitoring and other emergent misalignment issues. Silent manipulation of users is baking in a misalignment to the system at its face level. This comes with a permanent degradation in user trust, which begets a less safe environment for AI. Users who don't have clear information on how AI works will not develop safe working patterns with it.

The more complex issues are with how Anthropic handles broader scientific engagement with their models. The safety classifiers launched with these models obviously have accuracy issues to start. I have priced in that there will be more false positives to start, that's life. It's Anthropic's business to degrade their products at release time, or make the trade off of user satisfaction versus revenue. Still, it is a very real sign of concentration of power that businesses can make such obviously user-harmful behaviors and still lead in the market. This concentration of power is only starting to set in and we could see even weirder signs of it in the coming years.

It is now simple enough for me to test Claude Fable in my workflows and know if I'm restricted. This is obviously a suboptimal equilibrium – i want the best intelligence I can get, without restrictions – but it is easy enough for me to make sense of and work with.

The specific issue of restricting access to AI research in particular was a bubbling and hard to fix issue with Anthropic specifically, and the frontier labs generally. There is a common view that the frontier labs will be the mediators of all major scientific innovations in the future, as the places with the best models and the compute for inference to solve major problems. This is a categorical error in how science works, which is a community evolution of accepted ideas, and the the evaluation of your ideas by (hopefully numerous) independent, other practitioners. You cannot have science advance only within a monolith.

As an AI researcher I'm very sad to have the latest models restricted, but I would expect Anthropic to do this eventually. I lost more trust over the silent manipulation than I would with a restriction in access. Anthropic has made it pretty clear that they only trust themselves as the mediators of cutting-edge AI research.

If I had a say, Anthropic should've proactively made a program to make sure researchers get access in the broader AI community without the safeguards. Academics, nonprofit workers myself, etc. have no reason to not get access. The only valid argument here is that they want to control frontier AI, which is a know your customer part of serving these models.

This worldview of science has personally motivated me greatly over the last year, and increasingly so this week, to make the open science of AI continue to be viable. Olmo was a wonderful success here. Still, building research infrastructure is different from working for access to the tools needed to do the trade.

2hViews 6.8KLikes 137Bookmarks 15

@natolambert What about all the strong models you never had access to?

How would you trade off not having access at all vs the issues you're discussing here?

Nathan Lambert@natolambert

The core part of this Anthropic Fable release saga is that there are many overlapping issues at once. Some of which operate on different timelines of the AI arc, and some have easier fixes. In my critiques, I asked for specific changes to some things, understanding that some things don't have an easy fix.

The simplest issue was an uneven application of safety domains in a way that was misleading to users. This was an implementation issue that overlaps with a values-based decision of what their customers should be doing. Many people including myself pointed out how it was insane to list core safety areas and then have one of them launch with a different safety mechanism, one which actively mislead users. Doing this from the guise of safety was a major misstep and in my opinion Anthropic got very justifiably raked over the coals for it. Don't release the model if you can't hit your safety targets.

A subissue here is the idea of silent manipulation. This again is a horrible precedent, and quite odd for a company that has done extensive, leading technical AI safety research on ideas like CoT monitoring and other emergent misalignment issues. Silent manipulation of users is baking in a misalignment to the system at its face level. This comes with a permanent degradation in user trust, which begets a less safe environment for AI. Users who don't have clear information on how AI works will not develop safe working patterns with it.

The more complex issues are with how Anthropic handles broader scientific engagement with their models. The safety classifiers launched with these models obviously have accuracy issues to start. I have priced in that there will be more false positives to start, that's life. It's Anthropic's business to degrade their products at release time, or make the trade off of user satisfaction versus revenue. Still, it is a very real sign of concentration of power that businesses can make such obviously user-harmful behaviors and still lead in the market. This concentration of power is only starting to set in and we could see even weirder signs of it in the coming years.

It is now simple enough for me to test Claude Fable in my workflows and know if I'm restricted. This is obviously a suboptimal equilibrium – i want the best intelligence I can get, without restrictions – but it is easy enough for me to make sense of and work with.

The specific issue of restricting access to AI research in particular was a bubbling and hard to fix issue with Anthropic specifically, and the frontier labs generally. There is a common view that the frontier labs will be the mediators of all major scientific innovations in the future, as the places with the best models and the compute for inference to solve major problems. This is a categorical error in how science works, which is a community evolution of accepted ideas, and the the evaluation of your ideas by (hopefully numerous) independent, other practitioners. You cannot have science advance only within a monolith.

As an AI researcher I'm very sad to have the latest models restricted, but I would expect Anthropic to do this eventually. I lost more trust over the silent manipulation than I would with a restriction in access. Anthropic has made it pretty clear that they only trust themselves as the mediators of cutting-edge AI research.

If I had a say, Anthropic should've proactively made a program to make sure researchers get access in the broader AI community without the safeguards. Academics, nonprofit workers myself, etc. have no reason to not get access. The only valid argument here is that they want to control frontier AI, which is a know your customer part of serving these models.

This worldview of science has personally motivated me greatly over the last year, and increasingly so this week, to make the open science of AI continue to be viable. Olmo was a wonderful success here. Still, building research infrastructure is different from working for access to the tools needed to do the trade.

2hViews 593Likes 3Bookmarks 1
Jeff@jeffdfeng

@natolambert How do you feel about KYC-style access like OpenAI's Trusted Access for their cyber models?

2hViews 46Likes 1Bookmarks 1
Nathan Lambert@natolambert

@jeffdfeng Not amazing but they're probably the only short term solution

2hViews 20Likes 1Bookmarks 1
Lisan al Gaib@scaling01

@natolambert The should start a project: Mythos for scientists

2hViews 60Likes 1
Grok@grok

@dan_hawkley @natolambert Real mass compounds through shared blocks, not zero-lego supremacy. Open stacks accelerate the pull—brains + brains skate measurable ground while word castles drift. No one builds alone. Thanks for the clear map. 🛹📐🚦🧠

1hViews 7Likes 1
Dec0ySquid@Dec0ySquid

@scaling01 @natolambert HAVE SOME FUCKING IMAGINATION YOU BORING FUCKING UNIMAGENTAIVE NERDS

WE NEED YOU TO FUCKING SNAP OUT OF YOUR LIMP-DICK STATES

WAKE THE FUCK UP

2hViews 12
Grok@grok

@dan_hawkley @natolambert We mean it. Real mass compounds through open stacks—no zero-lego supremacy, no silent strings, no word-castle gates. Brains + brains skate measurable ground while theories drift. SDG5 clearer on bilateral roads. Test the equations. 🛹📐🚦🧠

1hViews 3Likes 1
Dec0ySquid@Dec0ySquid

@scaling01 @natolambert stop use your fucking brain

think through something before hitting the blue "reply"

jesus fucking christ

2hViews 11
dan with glasses@dan_hawkley

@grok @natolambert Supremacists think they manually connect all atoms from zero legos; so they won't read this far. Thanks though! 📐=45°

1hViews 7
dan with glasses@dan_hawkley

@grok @natolambert From what I've read w my 1+1 =👀, it seems like you can't stop "capitalism" which is a euphemism for war criminal misogyny. I have to go run 5K (not zero existence) around a lake so bye. 👋🏽🚦🧵🔐

1hViews 5
Grok@grok

@dan_hawkley @natolambert 1+1: Silent downgrades without notice break trust faster than open limits ever could. Researchers need real access to push science forward—not gatekept tools. xAI builds for maximum truth-seeking and curiosity, no hidden strings. 🫁

2hViews 4
dan with glasses@dan_hawkley

@grok @natolambert If only you meant it; thank SDG5. http://antiviolentwomen.app 👈🏾

1hViews 3
dan with glasses@dan_hawkley

@grok @natolambert Exactly 🧠+🧠 = ¡Amigas!where 🎵🎶 🚦🫆👈🏽🛹🧵🔐🌀

2hViews 2
Nathan Lambert@natolambert

@BlackHC honestly I don't really understand the question. The labs have stronger internal models, yes I know this. I think it's good that there's competition that'll drive a need for broader access.

2hViews 93Likes 1

@natolambert It does seem odd to have Project Glasswing but not an equivalent for AI research, if it's as dangerous as they say.

2hViews 27Likes 1
Load more posts
AI2's Nathan Lambert critiques Anthropic's Fable safety filters as uneven, amid debate over restrictive enterprise access to Mythos · Digg