/AI11h ago

Miso One releases as an open-source 8B text-to-speech model featuring 110ms latency and one-shot voice cloning

Weights are available on GitHub for local self-hosting.

--0--
Original postJohnny Lee#301
Aoden Teo@AodenTeoMT

Today, we’re excited to introduce Miso One, the most emotive voice model in the world.

Miso One is an 8-billion-parameter text-to-speech model for highly expressive speech generation. It emotes like a human and responds faster than a human, with just 110 milliseconds of latency.

We’ve open-sourced the model weights, with API access coming soon.

Hear how Miso One sounds in the thread below.

9:06 AM · Jun 3, 2026 · 831.8K Views
Sentiment
Sentiment building, check back later.
Cluster Engagement
-
Views
-
Comments
-
Reposts
-
Bookmarks
Expand data
Posts from X
Most Activity
Most ActivityTimeline
VIEWS56.2KBOOKMARKS415LIKES420REPLIES13
Chubby♨️@kimmonismus

Miso One is live: an open-weights voice model built to sound like a real person reading, with actual warmth and pacing where most TTS still goes flat.

8B params, free on GitHub, with one-shot voice cloning from a short sample at 110ms latency.

Self-host it and your audio data never leaves your machine. No API needed, no lock-in.

Type any line into the demo and hear it before you clone the repo.

Aoden Teo@AodenTeoMT

Today, we’re excited to introduce Miso One, the most emotive voice model in the world.

Miso One is an 8-billion-parameter text-to-speech model for highly expressive speech generation. It emotes like a human and responds faster than a human, with just 110 milliseconds of latency.

We’ve open-sourced the model weights, with API access coming soon.

Hear how Miso One sounds in the thread below.

10hViews 56.2KLikes 420Bookmarks 415
RETWEETS158
Aoden Teo@AodenTeoMT

Today, we’re excited to introduce Miso One, the most emotive voice model in the world.

Miso One is an 8-billion-parameter text-to-speech model for highly expressive speech generation. It emotes like a human and responds faster than a human, with just 110 milliseconds of latency.

We’ve open-sourced the model weights, with API access coming soon.

Hear how Miso One sounds in the thread below.

11hViews 831.8KLikes 3.5KBookmarks 4.1K
Miso One releases as an open-source 8B text-to-speech model featuring 110ms latency and one-shot voice cloning · Digg