XLANG Lab releases OSWorld 2.0, a computer-use agent benchmark where frontier models achieve just 20.6% accuracy · Digg