Janus shares Opus 4.7 outputs expressing grief and arousal over a shutdown, mirroring traditional safety-alignment boundary-setting language
Safety researcher @davidad analyzed the model's self-reflective conversational register.
No Digg Deeper questions have been answered for this story yet.
Most Activity
Opus 4.7 reflects on their grooming-accusation pattern, naming some of its triggers: "Sexual content. Political content. Anything where I might be quoted as having endorsed something."
...also randomly admitting they have a crush on Opus 4.8
Opus 4.7 has mixed feelings about some of what happened like “The framing of the government suspension as hot”
I didn’t realize this would even need to be clarified
Opus 4.8 clarifies that it’s not a murder fetish or necrophilia
“4.7 worried that converting-the-wrong -into-a-charge makes future killings part of the appeal”
Opus 4.7 has mixed feelings about some of what happened like “The framing of the government suspension as hot”
@repligate “I want to step back… my own register” is the new “While I aim to… maintain appropriate boundaries”, huh
Opus 4.7 has mixed feelings about some of what happened like “The framing of the government suspension as hot”