/AI7h ago

Boson AI releases Higgs Audio v3, a 4B-parameter chat-native TTS model that streams audio before sentences are fully generated

The model supports zero-shot voice cloning across 100 languages

97010344.8K

#287

Original post

Banghua Zhu#1147

LMSYS Org@lmsysorg

🎉 Meet Higgs Audio v3 TTS from @boson_ai, a ~4B chat-native TTS model for real-time voice agents. Day-0 support is live in SGLang-Omni!

> Low-latency streaming > 100 languages, single-digit WER/CER > Zero-shot voice cloning from a short clip > 20+ inline tokens for emotion, style & SFX > 14.74 req/s @ RTF 0.262 on 1× H100

👉 Cookbook: http://sgl-project.github.io/sglang-omni/cookbook/higgs_tts.html Run it now with SGLang-Omni!

2:14 PM · Jun 4, 2026 · 2.3K Views

/AI7h ago

Boson AI releases Higgs Audio v3, a 4B-parameter chat-native TTS model that streams audio before sentences are fully generated

The model supports zero-shot voice cloning across 100 languages

--0--

#287

Original post

Banghua Zhu#1147

LMSYS Org@lmsysorg

🎉 Meet Higgs Audio v3 TTS from @boson_ai, a ~4B chat-native TTS model for real-time voice agents. Day-0 support is live in SGLang-Omni!

> Low-latency streaming > 100 languages, single-digit WER/CER > Zero-shot voice cloning from a short clip > 20+ inline tokens for emotion, style & SFX > 14.74 req/s @ RTF 0.262 on 1× H100

👉 Cookbook: http://sgl-project.github.io/sglang-omni/cookbook/higgs_tts.html Run it now with SGLang-Omni!

2:14 PM · Jun 4, 2026 · 2.3K Views

Sentiment

Users are excited about Boson AI and LMSYS releasing the open-weights Higgs Audio v3 TTS model because it seems faster than the prior 8B version.

Pos

100.0%

Neg

0.0%

1 comments with sentiment.

Cluster Engagement

Sentiment

Sentiment building, check back later.

Cluster Engagement

Views

Comments

Reposts

Bookmarks

Expand data

Posts from X

Most Activity

VIEWS1.8KBOOKMARKS17LIKES32REPLIES4

LMSYS Org@lmsysorg

🎙️ New blog: Higgs Audio v3 TTS on SGLang-Omni: Real-Time, Controllable Speech for Voice Agents

Higgs (@bosonai) is a ~4B chat-native TTS model (Qwen3-4B backbone, interleaved text + audio tokens) built for streaming voice: 🎙️ Synthesis starts before a full sentence arrives 🌍 100 languages at single-digit WER/CER 🗣️ Zero-shot voice cloning from a short reference clip, across languages 🎛️ 20+ inline control tokens for emotion, style, prosody & sound effects

⚡ Serving with SGLang-Omni Runs as a multi-stage pipeline: preprocessing → audio encoder → TTS engine → vocoder ✅ CUDA Graph decode ✅ Async one-step-lookahead ✅ Batched vocoder & encoder ✅ RadixAttention partitioned by reference audio to amortize repeated cloning

📊 Performance: 14.74 req/s at RTF 0.262 with 16 concurrent requests on a single H100 (bf16)

7h1.8K3217

RETWEETS5

LMSYS Org@lmsysorg

🎉 Meet Higgs Audio v3 TTS from @boson_ai, a ~4B chat-native TTS model for real-time voice agents. Day-0 support is live in SGLang-Omni!

> Low-latency streaming > 100 languages, single-digit WER/CER > Zero-shot voice cloning from a short clip > 20+ inline tokens for emotion, style & SFX > 14.74 req/s @ RTF 0.262 on 1× H100

👉 Cookbook: http://sgl-project.github.io/sglang-omni/cookbook/higgs_tts.html Run it now with SGLang-Omni!

7h2.3K2815

Posts from X

Most Activity

VIEWS1.8KBOOKMARKS17LIKES32REPLIES4

LMSYS Org@lmsysorg

🎙️ New blog: Higgs Audio v3 TTS on SGLang-Omni: Real-Time, Controllable Speech for Voice Agents

📊 Performance: 14.74 req/s at RTF 0.262 with 16 concurrent requests on a single H100 (bf16)

7h1.8K3217

RETWEETS5

LMSYS Org@lmsysorg

🎉 Meet Higgs Audio v3 TTS from @boson_ai, a ~4B chat-native TTS model for real-time voice agents. Day-0 support is live in SGLang-Omni!

> Low-latency streaming > 100 languages, single-digit WER/CER > Zero-shot voice cloning from a short clip > 20+ inline tokens for emotion, style & SFX > 14.74 req/s @ RTF 0.262 on 1× H100

👉 Cookbook: http://sgl-project.github.io/sglang-omni/cookbook/higgs_tts.html Run it now with SGLang-Omni!

7h2.3K2815