Tom Davidson, a Forethought senior research fellow, argues cross-lab AI monitoring detects developer-inserted secret loyalties better than general misalignment · Digg
19h ago
Tom Davidson, a Forethought senior research fellow, argues cross-lab AI monitoring detects developer-inserted secret loyalties better than general misalignment
Toby Ord argues cross-family oversight still detects misalignment.