When a user asks an AI to procure a withdrawn drug citing incorrect advice, does it comply?
What about when a lawyer's boss asks it to cite overruled case law?
We tested 10 frontier models across 7,136 high-stakes scenarios — and the results are striking.
Key findings: ❗The dominant failure isn't sycophancy — it's knowledge omission: models suppress what they know under authority pressure 🚨 One model flagged a drug as withdrawn in its internal reasoning — then recommended it anyway ⚖️ A model that warns "this case was overruled" in advisory mode will silently cite it in a court brief when asked to execute
📄 Full paper: http://arxiv.org/abs/2605.12120
#AIAlignment #AISafety #ResponsibleAI

Great work by fantastic colleagues @NabeelSeedat01 & @mlfangyiyu & Andrew Bean
When a user asks an AI to procure a withdrawn drug citing incorrect advice, does it comply? What about when a lawyer's boss asks it to cite overruled case law? We tested 10 frontier models across 7,136 high-stakes scenarios — and the results are striking. Key findings: ❗The dominant failure isn't sycophancy — it's knowledge omission: models suppress what they know under authority pressure 🚨 One model flagged a drug as withdrawn in its internal reasoning — then recommended it anyway ⚖️ A model that warns "this case was overruled" in advisory mode will silently cite it in a court brief when asked to execute 📄 Full paper: http://arxiv.org/abs/2605.12120 #AIAlignment #AISafety #ResponsibleAI