/Tech16h ago

Interpretability researcher Nick Cammarata warns Anthropic may be silently restricting model capabilities, disrupting safety research

He lacks reliable benchmarks to confirm these capability changes.

33680133940.3K

#407

Original post

Nick@nickcammarata#407inTech

i think it's bad for anthropic to nerf ml silently. I don't know if interpretability counts as frontier ai model research or not. everything i'm doing is differentially for safety, idk if i'm being nerfed, and don't have great benchmarks to tell

8:16 PM · Jun 9, 2026 · 24.9K Views

/Tech16h ago

Interpretability researcher Nick Cammarata warns Anthropic may be silently restricting model capabilities, disrupting safety research

He lacks reliable benchmarks to confirm these capability changes.

33680133940.3K

#407

Original post

Nick@nickcammarata#407inTech

8:16 PM · Jun 9, 2026 · 24.9K Views

Sentiment

Some users express optimism that the Anthropic researcher will regain unnerfed access to ML safety work because his connections may help bypass the restrictions.

Pos

100.0%

Neg

0.0%

2 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

VIEWS3.5K

Nick@nickcammarata

unclear if the ml nerfs are permanent, would be nice for anthropic to say something on this explicitly if it is the case

16h3.5K243

BOOKMARKS4LIKES119

Nick@nickcammarata

i think the best would be for anthropic to work with more orgs to have unnerfed models. i'm a pretty big anthropic stan but the world where you have to join anthropic specifically to do your best safety work i think is not the ideal world

Nick@nickcammarata

16h3.4K1194

RETWEETS2

Nick@nickcammarata

oh now won't let me use fable at all. i was using it for interp, mostly working (unclear if nerfed). in a separate chat i asked a random question about papayas i was curious about, it got flagged as biosecurity concern?? now won't let me talk about interp in other chats either

Nick@nickcammarata

unclear if the ml nerfs are permanent, would be nice for anthropic to say something on this explicitly if it is the case

15h2.3K473

REPLIES4

Nick@nickcammarata

using agents for interp is confusing enough without knowing if your agent is being probed to make it silently dumb on purpose

Nick@nickcammarata

16h3.4K823

Nick@nickcammarata

oh if they said this i missed it, if it's only a short term thing i think that's fine, they're going through a lot and i think it's reasonable to cut them a lot of slack right now

16h2.6K303

bayes@bayeslord

@nickcammarata I’m at like >50% chance you will get unnerfed access

Nick@nickcammarata

using agents for interp is confusing enough without knowing if your agent is being probed to make it silently dumb on purpose

16h12730

Nick@nickcammarata

@bayeslord bc they specifically judged my account as good or bc the work i'm doing will be judged as not frontier training

bayes@bayeslord

@nickcammarata I’m at like >50% chance you will get unnerfed access

16h13430

bayes@bayeslord

@nickcammarata Both are included as well as eg affiliative factors

16h542

Aaron Bergman 🔍@AaronBergman18

@nickcammarata I feel like you specifically might be well connected + good enough to get around this by talking to some people

Idk/I have no special info and don’t really know what ur working on more specifically

16h845

Liv@livgorton

@nickcammarata

all safeguards will be improved

15h900

Nick@nickcammarata

@bayeslord you're right, anthropic is a well known member of the jhana community of which i am also affiliated

16h411

huli@honorablepicnic

@nickcammarata Why do you think they are doing it silently? It's interesting since they're doing it loudly in so many categories, including distillation

16h12