9h ago

Bowen Wang and XLANG Lab release CUA-Gym to generate RLVR training data and prevent reward hacking in computer-use agents

The accompanying dataset features 32,122 tasks across 110 environments.

82134121136.0K

——0——

Original post

#1175@REIINAKANOOP

Bowen Wang@BOWENWANGNLP

RLVR has become the recipe for agentic post-training. But for Computer-Use Agents, the bottleneck is not the algorithm, it is the data. 🐌 🚀 We introduce CUA-Gym: a scalable, lightweight synthesis engine that turns arbitrary task queries into verifiable RLVR data for computer-use agents. The largest open CUA RLVR dataset to date: 🎯 32,122 verifiable RLVR tasks with programmatic setup scripts + rewards 🌐 110 environments: 16 desktop apps + 94 synthesized mock web apps 🏆 Qwen3.5-based CUA models trained with GSPO reach 72.6% on OSWorld-Verified and 56.6% on WebArena 📄 Paper: https://huggingface.co/papers/2605.25624 🏠 Homepage: https://cua-gym.xlang.ai 🤗 Dataset: https://huggingface.co/datasets/xlangai/CUA-Gym 💻 Codebase: https://github.com/xlang-ai/CUA-Gym 🧩 Environments: https://github.com/xlang-ai/CUA-Gym-Hub 🧵[1/6]

7:36 AM · May 26, 2026

QUOTE POST

#886Tao Yu@TAOYDS

We've seen nice recent progress on Scaling CUA RL envs/tasks — but 🎯verifiable rewards🎯 for RL training have been largely missing, and that matters a lot for preventing reward hacking.

@BowenWangNLP's work tackles exactly this: 32K+RLVR tasks across 110 envs, check it out! 👇

Bowen Wang@BowenWangNLP

2:36 PM · May 26, 2026 · 37.1K Views

6:57 PM · May 26, 2026 · 770 Views

Bowen Wang and XLANG Lab release CUA-Gym to generate RLVR training data and prevent reward hacking in computer-use agents

Sentiment

Cluster engagement