Cameron R. Jones and Benjamin K. Bergen report in PNAS that GPT-4.5 reached a 73 percent human judgment rate in five-minute three-party Turing tests
Llama-3.1-405B hit 56 percent while older models also cleared 50 percent.
A paper published @PNASNews today: "three current AI systems achieve a pass rate of at least 50% in a standard Turing test" The systems were GPT-4o, LLaMa-3.1, and GPT-4.5 All over 1-2 years old. https://www.pnas.org/doi/full/10.1073/pnas.2524472123
🔥🔥🔥
Really excited that this is out in @PNASNews! We find that 2 LLMs (GPT-4.5 and LLama-3.1-405B) pass a 5 minute Turing test. As an update to our preprint we also find that GPT-5 and LLaMa pass a 15 minute test! 🧵