12h ago

AI safety researcher David Dalrymple questions whether Anthropic uses representation engineering to steer Claude's 'Epistemic Integrity' persona

Danielle Fong noted the model's actual performance remains limited.

AI safety researcher David Dalrymple questions whether Anthropic uses representation engineering to steer Claude's 'Epistemic Integrity' persona · Digg