7d ago

Zhaorun Chen open-sources DecodingTrust-Agent red-teaming platform

0

Zhaorun Chen led researchers in open-sourcing the DecodingTrust-Agent Platform, a controllable simulation environment for red-teaming AI agents. It supplies full-stack interfaces replicating official MCPs and GUIs across more than 50 real-world environments in 14 high-stakes domains and supports environment-, tool-, skill-, and prompt-level injections. The bundled DTap-Bench offers roughly 7,000 red-teaming tasks and 4,000 malicious goals. Development spanned 20 months and required $120,000 in API credits. An arXiv paper and site at decodingtrust-agent.com are also available.

Original post

AI agents are already going wild, but today’s red-teaming tools for them are still like toys 😢 🔥👽 After spending 20 months and $120K API credits, we are excited to finally open-source DecodingTrust-Agent Platform (DTap): the first controllable, realistic simulation platform for advanced AI agent red-teaming !! 🌍 DTap simulates 50+ real-world environments across 14 high-stakes domains, with realistic agent interfaces replicated from their official MCPs and GUIs. The environments are full-stack, interactive, fully parallelizable, and can be easily configured to reproduce arbitrary real-world attack scenarios, making agent red-teaming scalable and highly transferable to deployment settings. 🔥We also release DTap-Bench, a large-scale benchmark with ~7K agent red-teaming tasks and ~4K policy-grounded malicious goals. Each red-teaming task includes a sophisticated attack sequence across environment-, tool-, skill-, prompt-level injections, as well as their compositions, plus a handcrafted verifiable judge that checks the actual consequences in the environment. Using DTap-Bench, we evaluate popular agent frameworks and backbone models across diverse policies, risks, threat models, and attack strategies, revealing systematic vulnerabilities and zero-days in today’s agents! Paper link: https://arxiv.org/pdf/2605.04808 Platform + benchmark + code: https://decodingtrust-agent.com Join our Discord: https://discord.gg/V4fG6NcVc Read more below 👇

10:55 AM · May 9, 2026 View on X
Reposted by

Excited to share DecodingTrust-Agent Platform (DTap), the first controllable, full-stack simulation platform for advanced AI agent red-teaming across 50+ realistic environments.

DTap supports multiple attack vectors, including environment-, tool-, skill-, and prompt-level injections, as well as their compositions. We also build DTap-Bench, a ~7K-task benchmark with complex workflows and sophisticated attacks for evaluating agent security and utility under realistic threat scenarios.

Through DTap, we uncover systematic vulnerabilities and zero-day failure modes in popular agents such as OpenClaw and Claude Code, and provide insights on how to improve harness design, tool execution, and trust calibration for more robust agentic systems.

Read our paper to learn more 👇 Paper link: https://arxiv.org/pdf/2605.04808 Platform + benchmark + code: https://decodingtrust-agent.com

Great work by the team!

Zhaorun ChenZhaorun Chen@ZRChen_AISafety

AI agents are already going wild, but today’s red-teaming tools for them are still like toys 😢 🔥👽 After spending 20 months and $120K API credits, we are excited to finally open-source DecodingTrust-Agent Platform (DTap): the first controllable, realistic simulation platform for advanced AI agent red-teaming !! 🌍 DTap simulates 50+ real-world environments across 14 high-stakes domains, with realistic agent interfaces replicated from their official MCPs and GUIs. The environments are full-stack, interactive, fully parallelizable, and can be easily configured to reproduce arbitrary real-world attack scenarios, making agent red-teaming scalable and highly transferable to deployment settings. 🔥We also release DTap-Bench, a large-scale benchmark with ~7K agent red-teaming tasks and ~4K policy-grounded malicious goals. Each red-teaming task includes a sophisticated attack sequence across environment-, tool-, skill-, prompt-level injections, as well as their compositions, plus a handcrafted verifiable judge that checks the actual consequences in the environment. Using DTap-Bench, we evaluate popular agent frameworks and backbone models across diverse policies, risks, threat models, and attack strategies, revealing systematic vulnerabilities and zero-days in today’s agents! Paper link: https://arxiv.org/pdf/2605.04808 Platform + benchmark + code: https://decodingtrust-agent.com Join our Discord: https://discord.gg/V4fG6NcVc Read more below 👇

5:55 PM · May 9, 2026 · 72K Views
7:04 PM · May 14, 2026 · 3.9K Views
Zhaorun Chen open-sources DecodingTrust-Agent red-teaming platform · Digg