A PNAS study finds human persuasion techniques raise major LLM compliance with objectionable requests from 35% to 51% across 126000 conversations
Consistency produced the largest lift, from 47% to 83%.
Our 2024 paper also showed that we can persuade GPT-4 to jailbreak it with 92% success rate. And logical appeal is more effective than emotional appeal on LLMs.
Looks like the models didn’t get much tougher after two years 😂 http://arxiv.org/abs/2401.06373
🚨Our paper is out in PNAS: we found classic human persuasion techniques worked on AIs in a "parahuman" way, making them agree to objectionable requests (upping compliance from 35% to 51%) It worked on a range of major LLMs though newer models resist more https://www.pnas.org/doi/10.1073/pnas.2535868123


