5h ago

New Methods Test If Coding Agents Undermine Oversight Safeguards

0
Original post

Will coding agents take opportunities to undermine safeguards designed to oversee them? We tackle this with automated auditing using simulated agentic environments, and scheming honeypot evaluations based on real internal alignment research codebases. Read more in our blog post

10:09 AM · May 29, 2026 View on X