METR Researcher Red-Teams Anthropic Agent Monitoring Systems in Security Exercise
——0——
I’m probably going to be hiring at least 1-2 people to join me in future exercises like this. Reach out at david@metr.org if you're a high-integrity, scrappy, creative, security+LLM researcher
For more detail, see METR's Frontier Risk Report, Appendix B
metr.org
Frontier Risk Report (February to March 2026)
@misc {frontier-risk-report-february-to-march-2026, title = {Frontier Risk Report (February to March 2026)}, author = {METR}, howpublished = {\url{https://metr.org/blog/2026-05-19-frontier-risk-report/}}, year = {2026}, month = {05},} Assessment Window: Feb 16, 2026 – Mar 16, 2026 Redaction summary statement: Except where explicitly noted in the report, there was no additional redacted information that was important to our conclusions from any of the participating companies. Starting in February 2026,
Another note: Anthropic had full discretion to redact sensitive information from my outputs, including info that could have substantially affected my conclusions/takeaways. This didn’t happen in practice, but this is part of why I call this an “exercise” instead of an “audit”.
6:14 PM · May 19, 2026 · 177 Views
6:14 PM · May 19, 2026 · 303 Views