/AI10h ago

VLLM-Omni V0.22.0 Adds NVIDIA Cosmos 3 Support And Robot Serving

13295459420.9K
Original postMing-Yu Liu#1452
vLLM@vllm_project

🎉 Meet vLLM-Omni v0.22.0, a major upgrade for omnimodal world models and production-grade multimodal serving.

🌍 Day-0 @NVIDIAAI Cosmos 3 world models: text, image, audio, video, and action, in and out. 🤖 Robot serving: DreamZero + OpenPI realtime API. 🎙️ Production TTS: Qwen3-TTS, Qwen3-Omni, VoxCPM2 and more. 🎨 Faster image/video/diffusion: Wan 2.2, HunyuanVideo 1.5, LTX-2.3. ⚡ Broader quantization (FP8/INT8, MXFP4/MXFP8, W4A16, ModelOpt) and hardware coverage.

339 commits, 124 contributors, 52 of them new. Thank you all. 🙌

🔗 https://github.com/vllm-project/vllm-omni/releases/tag/v0.22.0

8:55 AM · Jun 8, 2026 · 20.9K Views
Sentiment

Users praise vLLM-Omni v0.22.0 for its robotics serving support like DreamZero plus OpenPI realtime API and for its insane-fast FP8 quantization performance.

Pos
100.0%
Neg
0.0%
2 comments with sentiment.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS143
阿納斯塔西婭@3___infinix

@vllm_project @NVIDIAAI vLLM 推 multimodal 越做越猛了,Cosmos 3 Day-0 支援滿翻天的。實際 latency 表現呢?

9hViews 143
Oleksandr@dadadaistt

@vllm_project @NVIDIAAI omni serving insane fast fp8 quantization lit

5hViews 50

@vllm_project @NVIDIAAI Production-grade multimodal serving is becoming a distinct infrastructure layer from text-only inference.

5hViews 33

@vllm_project @NVIDIAAI Keep your LLMs & agents secure! https://github.com/OraclesTech/guardian-sdk

8hViews 7
cc@ccb8128

@vllm_project @NVIDIAAI The robotics serving support (DreamZero + OpenPI realtime API) is the sleeper feature here. Everyone's focused on chat/vision, but real-time robot serving at production grade is where multimodal actually gets interesting. 52 new contributors in one release is wild too.

6h