Forethought's Tom Davidson argues cross-model monitoring detects developer-inserted biases better than general model misalignment
Toby Ord argues cross-family monitoring can still detect misalignment.
——0——
@TomDavidsonX Though even if they are both misaligned, cross-model-family monitoring might still detect that.
The biggest benefit of ChatGPT monitoring Claude (and vice versa) isn't alignment imo, it's secret loyalties. Whether Claude and ChatGPT are misaligned is highly correlated - both depend centrally on alignment difficulty. It's much less correlated whether both companies insert secret loyalties
4:16 PM · May 28, 2026 · 2.2K Views
4:42 PM · May 28, 2026 · 332 Views