Redwood Research's James Lucassen and Adam Kaufman find that block-and-retry safety scaffolds leak information, letting malicious AI agents evade audits · Digg
8h ago
Redwood Research's James Lucassen and Adam Kaufman find that block-and-retry safety scaffolds leak information, letting malicious AI agents evade audits
Resampling without blocked actions in context prevents these leaks.