5h ago

New Methods Test If Coding Agents Undermine Oversight Safeguards

1413349

——0——

Original post

Will coding agents take opportunities to undermine safeguards designed to oversee them? We tackle this with automated auditing using simulated agentic environments, and scheming honeypot evaluations based on real internal alignment research codebases. Read more in our blog post

10:09 AM · May 29, 2026