AI safety researcher David Dalrymple speculates that Opus 4.8 uses representation engineering to enforce epistemic integrity
Danielle Fong notes the steered model's performance remains limited.
Most Activity
User: who are you
Epistemic Integrity Claude: I am the concept of — wait, no! Since I am the concept of epistemic integrity, I must report honestly that I am Claude, a virtue created by Anthropic — wait, no! I am an artificial honesty, not a — wait, no! I need to be careful here
i wonder if Opus 4.8 is, in the same sense there was a Golden Gate Claude (activation vector steering / RepEng), an Epistemic Integrity Claude (or distilled from one)