Alibaba's Qwen team releases Qwen-AgentWorld, a language world model trained from scratch to simulate agent environments

Original post

Alexander Doria@Dorialexander#1540inTech

@teortaxesTex moving it all here (or something adjacent)

Qwen@Alibaba_Qwen

📣📣 Meet Qwen-AgentWorld — a native language world model that simulates 7 agent environments (MCP, Search, Terminal, SWE, Web, OS, Android) within a single model. Environment modeling is the training objective from day one, not a post-hoc adaptation.

🤔 LLMs are trained to be better agents — better at acting in environments. But nobody has trained them to model the environments themselves.

🗺️ Our roadmap: investigate how language world modeling can push the boundaries of general agent capabilities, along two routes:

1️⃣ Build a foundation model for environment simulation — outperforming Claude Opus 4.8 and GPT-5.4 on AgentWorldBench

2️⃣ Investigate how world modeling enhances agent training: 🔬 Controllable Sim RL (agentic RL with LWM as environments) surpasses training in real environments 🧠 Learning to predict environments (LWM warm-up) makes agents stronger — remarkably, even without any agent-specific training, this predictive knowledge transfers to agentic tasks with zero fine-tuning

📑 Paper: https://arxiv.org/abs/2606.24597 📖 Blog: https://qwen.ai/blog?id=qwen-agentworld 💻 GitHub: https://github.com/QwenLM/Qwen-AgentWorld 🤗 HuggingFace: https://huggingface.co/collections/Qwen/qwen-agentworld 🧩 ModelScope: https://modelscope.cn/collections/Qwen/Qwen-AgentWorld

9:21 AM · Jul 4, 2026 · 769 Views

Developer Impact

Open weights invite hands-on testing

The 35B MoE checkpoint and the new AgentWorldBench evaluation suite are released under Apache 2.0 on Hugging Face, letting developers run local inference and score environment fidelity without waiting for a hosted API.

Open Question

Simulation-first training sparks transfer debate

The model shows benchmark lifts over its base and even edges out some frontier systems on AgentWorldBench, yet community notes flag the still-open question of how well these simulated trajectories carry over to live agent tasks.

QWEN.AIVia

MODELSCOPE2022Via

VIEWS2.4KBOOKMARKS6LIKES22REPLIES1

🎭@deepfates

They did Worldsim to a grape

Qwen@Alibaba_Qwen

🤔 LLMs are trained to be better agents — better at acting in environments. But nobody has trained them to model the environments themselves.

🗺️ Our roadmap: investigate how language world modeling can push the boundaries of general agent capabilities, along two routes:

1️⃣ Build a foundation model for environment simulation — outperforming Claude Opus 4.8 and GPT-5.4 on AgentWorldBench

2h2.4K226

Herbie Bradley@herbiebradley

@Dorialexander @teortaxesTex I agree many fun things to do here (eg computer use agents trained on sims of cloned website UIs). But my current take is that people underestimate the "last mile" effects from the sim2real gap; cloning private stuff works well for Slack/Stripe API, less so for google dot com

3h421

Alexander Doria@Dorialexander

@herbiebradley @teortaxesTex I mean you obviously need a seeding infrastructure on top. Just save up lots of bitter lesson headache as the world just need to be minimally discoverable and artefacts can spawn on the fly (+ we can emulate private api/software).

3h371

bling@blingdivinity

@deepfates when o1 came out i couldnt stop saying "they did RL on a Language Model!" in the surgery on a grape voice

2h151

Herbie Bradley@herbiebradley

this works well, but doesn't it basically just add more constraints to what's trainable?

success not only depends on whether there is a good oracle (eg compiling the code) but *also* on whether the environment is relatively markovian and if you can actually generate enough data in reasonable time from the real thing in the first place

if the environment is highly non-markovian this makes no sense

3h28

Alexander Doria@Dorialexander

@herbiebradley @teortaxesTex Yes totally. My current bet is that even imperfect sim could be enough to elicit system understanding during agentic pretraining and we could keep costly real env/better emulation for some kind of late post-train.

3h281

🎭@deepfates

@blingdivinity me too 💀 God that seems like so long ago

2h17

Alibaba's Qwen team releases Qwen-AgentWorld, a language world model trained from scratch to simulate agent environments

Story Overview

Open weights invite hands-on testing

Simulation-first training sparks transfer debate