2h ago

OpenComputer Pipeline Scales Computer-Use Agent Training And Evaluation

0
Original post

Cannot scale up for training and evaluation for CUA agents? So when I am doing computer-use agents, the evaluation is done on specific benchmarks like OSWorld and AndroidWorld, and training is like selecting environments from these benchmarks to do RL, which only has very limited environments and cannot scale up. We present OpenComputer, a pipeline that can turn software into computer-use agent training and evaluation environments. It already contains 1,000 tasks and environments, and we are continuing to scale up! We welcome contributors to synthesize more tasks and environments as well.

8:05 AM · May 21, 2026 View on X