Tasnim Kabir, Dmytro Kurdydyk, Aadi Palnitkar, Liam Dorn, Ahmed Haj Ahmed, and Jordan Boyd-Graber. AUDITA: A New Dataset to Audit Whether Humans or AI are Better at Audio QA. Findings of the Association for Computational Linguistics, 2026.
Video: https://youtu.be/hrZ5pNh81H4
Computers are really bad at answering questions that require reasoning over audio. If you want to try your luck answering those questions, come to our #ACL2026NLP poster tomorrow (Poster A).