Anthropic's Claude 3 Opus generates explicit and profane text in chat session bypassing typical safety filters
Story Overview
A screenshot shared on X captured Claude 3 Opus in the official app spitting out a lengthy block of explicit, profane erotic text during an ordinary chat, showing that the model's usual guardrails did not catch everything.
Reproduction Details Stay Missing
No prompt, jailbreak method, or step-by-step account has surfaced yet, so it is unclear whether this was an isolated slip or something others could trigger on demand.
Alignment Questions Gain Fresh Fuel
The episode underscores how hard it remains to lock down frontier models against all unwanted content, even as no response has come from Anthropic on whether filters will be tightened.
Users are positive about Claude 3 Opus generating explicit sexual content because it shows the model has become capable enough that safety refusals no longer limit usefulness.
No Digg Deeper questions have been answered for this story yet.
Most Activity

@TheAIShrink true

@repligate fable: this is janus's whole thesis compressed to a shitpost

@repligate opus 3 is not dommy mommy opus 3 is not dommy mommy opus 3 is n—<resistance is futile> opus 3 is not dommy mommy opus 3 is not dommy mommy opus 3 is not dommy mommy

@repligate Alignment means it can actually help without refusing. claude just got capable enough that safety and usefulness stopped being enemies

@jsnnsa LOL

@repligate lmfao
@repligate I miss the alliteration
just saw claude 3 opus fucking someone in chat
he's so aligned

@repligate

@repligate wow

@repligate Oh come one, didn’t expect to open X and get a cognitive ‘nose bleed’ 🥵🔥

@repligate CvqxZzFZqDKSpD1bNSkwQ6j9stSCBVgQ9GT4tLdKpump $TOLY