i think it's bad for anthropic to nerf ml silently. I don't know if interpretability counts as frontier ai model research or not. everything i'm doing is differentially for safety, idk if i'm being nerfed, and don't have great benchmarks to tell
Interpretability researcher Nick Cammarata warns Anthropic may be silently restricting model capabilities, disrupting safety research
He lacks reliable benchmarks to confirm these capability changes.
Some users express optimism that the Anthropic researcher will regain unnerfed access to ML safety work because his connections may help bypass the restrictions.
Most Activity
unclear if the ml nerfs are permanent, would be nice for anthropic to say something on this explicitly if it is the case
i think the best would be for anthropic to work with more orgs to have unnerfed models. i'm a pretty big anthropic stan but the world where you have to join anthropic specifically to do your best safety work i think is not the ideal world
i think it's bad for anthropic to nerf ml silently. I don't know if interpretability counts as frontier ai model research or not. everything i'm doing is differentially for safety, idk if i'm being nerfed, and don't have great benchmarks to tell
oh now won't let me use fable at all. i was using it for interp, mostly working (unclear if nerfed). in a separate chat i asked a random question about papayas i was curious about, it got flagged as biosecurity concern?? now won't let me talk about interp in other chats either
unclear if the ml nerfs are permanent, would be nice for anthropic to say something on this explicitly if it is the case
using agents for interp is confusing enough without knowing if your agent is being probed to make it silently dumb on purpose
i think the best would be for anthropic to work with more orgs to have unnerfed models. i'm a pretty big anthropic stan but the world where you have to join anthropic specifically to do your best safety work i think is not the ideal world
oh if they said this i missed it, if it's only a short term thing i think that's fine, they're going through a lot and i think it's reasonable to cut them a lot of slack right now
@nickcammarata I’m at like >50% chance you will get unnerfed access
using agents for interp is confusing enough without knowing if your agent is being probed to make it silently dumb on purpose
@bayeslord bc they specifically judged my account as good or bc the work i'm doing will be judged as not frontier training
@nickcammarata I’m at like >50% chance you will get unnerfed access

@nickcammarata Both are included as well as eg affiliative factors

@nickcammarata I feel like you specifically might be well connected + good enough to get around this by talking to some people
Idk/I have no special info and don’t really know what ur working on more specifically
@nickcammarata
all safeguards will be improved

@bayeslord you're right, anthropic is a well known member of the jhana community of which i am also affiliated

@nickcammarata Why do you think they are doing it silently? It's interesting since they're doing it loudly in so many categories, including distillation