OSWorld 2.0 Benchmarks AI Agents on Long-Horizon Desktop Tasks · Digg