AI safety researcher David Dalrymple questions whether Anthropic uses representation engineering to steer Claude's 'Epistemic Integrity' persona · Digg
12h ago
AI safety researcher David Dalrymple questions whether Anthropic uses representation engineering to steer Claude's 'Epistemic Integrity' persona
Danielle Fong noted the model's actual performance remains limited.