Agents' Last Exam benchmark launches, finding GPT-5.5 and Fable 5 score under 2.6% on complex professional tasks · Digg