Agents' Last Exam Benchmark Tests AI Agents On 1,000+ Economic Tasks · Digg