Excited to share that our paper, "When Benign Inputs Lead to Severe Harms: Eliciting Unsafe Unintended Behaviors of Computer-Use Agents", has been accepted to #ICML2026! Looking forward to presenting in Seoul 🇰🇷
We introduce AutoElicit, an agentic framework that automatically elicits unsafe unintended behaviors from computer-use agents by iteratively perturbing benign instructions using execution feedback.
With AutoElicit, we proactively surface hundreds of severe harms from frontier CUAs within realistic and benign computer-use scenarios to uncover long-tail safety risks.
For the camera-ready version, we added several new experiments and ablations to assess AutoElicit's generalizability, validate the reliability of our findings, and identify the key factors driving elicitation success ⬇️

