5h ago

Redwood AI Paper Finds Blocked Actions Leak Info to Malicious Models

0
Original post

Some coding scaffolds block and retry risky actions. In a new paper, we find this reveals information a malicious AI can use to bypass monitoring. Resampling without blocked actions in context is less exploitable, but techniques that help in one setting can hurt in another. 🧵

10:52 AM · May 29, 2026 View on X