17h ago

Qwen Team releases Qwen-VLA, a vision-language-action model achieving 97.9% success on the LIBERO robotics benchmark

A Diffusion Transformer-based action decoder enables direct physical control.

74047521344.4K

——0——

Original post

#886@TAOYDSOP

Shuai Bai@SHUAI_BAI_

Excited to share Qwen-VLA paper, our exploration of generalist Vision-Language-Action models. It extends Qwen’s multimodal backbone from visual understanding and reasoning to continuous action generation and trajectory prediction. Paper: https://arxiv.org/pdf/2605.30280

8:31 PM · May 28, 2026

#28AK@_AKHALIQ

paper: https://huggingface.co/papers/2605.30280

AK@_akhaliq

Qwen-VLA Unifying Vision-Language-Action Modeling across Tasks, Environments, and Robot Embodiments

3:50 PM · May 29, 2026 · 3.3K Views

3:50 PM · May 29, 2026 · 2.5K Views

POST

#28AK@_AKHALIQ

Qwen-VLA

Unifying Vision-Language-Action Modeling across Tasks, Environments, and Robot Embodiments

3:50 PM · May 29, 2026 · 3.3K Views

QUOTE POST

#417Xin Eric Wang ✈️ CVPR 2026@XWANG_LK

Finally, this got done. It always felt off when people were treating VLAs as a separate class from multimodal LLMs. Better late than never.

Shuai Bai@shuai_bai_

3:31 AM · May 29, 2026 · 34.6K Views

6:00 PM · May 29, 2026 · 1.3K Views

QUOTE POST

#420Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@TEORTAXESTEX

Qwen isn't giving up its leadership in multimodality. It'll be interesting to watch how VLAs and world model-based-approaches compete, I think by 2027 we should have an answer.

Shuai Bai@shuai_bai_

3:31 AM · May 29, 2026 · 34.6K Views

4:28 AM · May 29, 2026 · 3K Views

Qwen Team releases Qwen-VLA, a vision-language-action model achieving 97.9% success on the LIBERO robotics benchmark

Sentiment

Cluster engagement