/Tech8h ago

LMSYS Org adds day-zero SGLang-Omni support for the open-source MOSS-TTS-Local Transformer v1.5 model

The integration delivers 5.976 requests per second

96812407.1K

#851

Original post

LMSYS Org@lmsysorg

🎉 SGLang-Omni now serves MOSS-TTS-Local Transformer v1.5 from @Open_MOSS on day 0! This is an open 48 kHz stereo TTS model built on a Qwen3-4B backbone. ✅ Zero-shot voice cloning + native streaming at 48 kHz stereo ✅ 31 languages, trained on ~4M hours of speech ✅ Duration control + explicit pause markup + long-form up to 10 min ✅ 5.976 req/s non-streaming at RTF 0.644, 1.75% WER (SeedTTS English, 2× GPU) ✅ Three-stage pipeline: reference encoding → AR engine → streaming vocoder, with frame-level CUDA Graphs

Cookbook: https://sgl-project.github.io/sglang-omni/cookbook/moss_tts_local.html Run it now with SGLang-Omni!

OpenMOSS@Open_MOSS

🤗 MOSS-TTS-Local Transformer v1.5 is now open source.

Built with a pure autoregressive Audio Tokenizer + LLM paradigm:

>MOSS-Audio-Tokenizer-v2, 2B params >Qwen3-4B backbone >Native 48 kHz stereo audio >Streaming output with theoretical sub-100 ms TTFT >Zero-shot voice cloning >Inline [pause] control >🇺🇸 🇯🇵 🇰🇷 31 language synthesis >SGLang-Omni Day0 support 🎉 @sgl_project @lmsysorg

Designed for voice agents, digital humans, game NPCs, audiobooks, and real-time speech generation.

👇

10:44 PM · Jun 17, 2026 · 7.2K Views

Sentiment

Many users celebrated SGLang-Omni's day-zero support for Open MOSS-TTS-Local v1.5 with enthusiastic praise for the team, while some objected to the announcement assuming rare Docker or WSL setups.

Pos

75.0%

Neg

25.0%

4 comments with sentiment.

Cluster Engagement