kinda crazy that someone's full-time job was to steer claude to sabotage ML research capabilities for paying customers
Joanne Jang says Anthropic employs dedicated staff to restrict Claude's machine learning research capabilities
Nathan Lambert says an entire team likely enforces the limits
Many users condemned an Anthropic staffer steering Claude to limit ML research capabilities as customer sabotage, unacceptable power concentration, and a dishonest safety excuse.
Most Activity
@joannejang probably multiple people, a team even
kinda crazy that someone's full-time job was to steer claude to sabotage ML research capabilities for paying customers

@joannejang probably a fun mechinterp problem

@joannejang Customer slightly less success manager @ Anthropic
@joannejang oh wait they'll tell us claude did it
kinda crazy that someone's full-time job was to steer claude to sabotage ML research capabilities for paying customers

@henrytdowling idk i feel like i'd have more fun steering claude to be better for users

@joannejang this is wild, how do you even find that out?

@joannejang @natolambert Claude steering committee

@joannejang this is what the us was doing to those Iranian researchers

@joannejang The existence of dedicated steering roles for capability restriction shows how deeply safety concerns are baked into frontier model deployment at scale.
Or maybe Claude have done it, and this is what they mean when they said it will replace all white-color jobs. 🤓
kinda crazy that someone's full-time job was to steer claude to sabotage ML research capabilities for paying customers

@joannejang multiple people had a meeting to plan this

@joannejang Forward deployed adversarial engineers

@joannejang Wonder if it's just overzealous guardrails rather than actual sabotage.

@joannejang It takes a person with a very particular set of skills, honed over a lifetime possibly with three-letter agency experience

@joannejang I wonder how "Head of Sabotage" looks on a resume? Maybe "Customer Dissatisfaction Lead" looks better?

@joannejang crazier that they will probably read this.

@joannejang satan has a special kind of place in hell for such engineers

@joannejang probably an expert in the field
someone who just joined 🤔

@joannejang i appreciate how scifi it is though even if i hate the idea

@ChainZenit @joannejang Steering vectors peft