/AI21h ago

Open-Source Vui Launches Local Voice Assistant With 300M TTS Model

4132113.3K
Original post
Rohan Paul@rohanpaul_ai#1031inAI

A new open-source voice stack just launched with a 200M TTS model built to run local, real-time voice assistants on consumer GPUs and Apple Silicon.

Vui, a fully-local voice assistant you can extend to do almost anything.

It packages the full loop: mic audio goes through WebRTC, speech is transcribed by faster-whisper or Moonshine, a local LLM writes the reply, and Vui Nano streams the voice back.

The big technical shift is that its context aware prosody the speech output changes depending on how you say it

Supports barge-in, sub-200ms turn latency on a 4090 GPU, memory, tool routing, and an OpenAI Realtime-compatible API.

The model was trained on 1M hours of multi speaker audio and non speech sounds, so it can handle breaths, hesitations, speaker rhythm, and tags like [breath] or [sigh], subtle disfluencies that make it sound more natural

The more interesting part is the thoughts stream, a second LLM call that silently decides whether the user’s request needs tools like Gmail, calendar, web search, shell commands, memories, or a custom agent.

Releasing vui an open source voice mode

300M TTS model

Runs on a single consumer gpu / apple sillicon

Context aware speech

6 minutes of context

11:56 AM · Jun 5, 2026 · 2.5K Views
Sentiment

Users praise the open-source Vui local voice assistant for its sub-200ms latency, barge-in support, parallel tool routing, and fully local operation.

Pos
100.0%
Neg
0.0%
1 comments with sentiment.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS859BOOKMARKS1LIKES4
Rohan Paul@rohanpaul_ai

Check its GIthub

http://github.com/fluxions-ai/vui

And you can directly try here:

http://fluxions.ai/talk http://fluxions.ai/speech

And official launch blog

https://fluxions.ai/blog/vui-launch/

Rohan Paul@rohanpaul_ai

A new open-source voice stack just launched with a 200M TTS model built to run local, real-time voice assistants on consumer GPUs and Apple Silicon.

Vui, a fully-local voice assistant you can extend to do almost anything.

It packages the full loop: mic audio goes through WebRTC, speech is transcribed by faster-whisper or Moonshine, a local LLM writes the reply, and Vui Nano streams the voice back.

The big technical shift is that its context aware prosody the speech output changes depending on how you say it

Supports barge-in, sub-200ms turn latency on a 4090 GPU, memory, tool routing, and an OpenAI Realtime-compatible API.

The model was trained on 1M hours of multi speaker audio and non speech sounds, so it can handle breaths, hesitations, speaker rhythm, and tags like [breath] or [sigh], subtle disfluencies that make it sound more natural

The more interesting part is the thoughts stream, a second LLM call that silently decides whether the user’s request needs tools like Gmail, calendar, web search, shell commands, memories, or a custom agent.

21hViews 859Likes 4Bookmarks 1
RETWEETS2
Rohan Paul@rohanpaul_ai

A new open-source voice stack just launched with a 200M TTS model built to run local, real-time voice assistants on consumer GPUs and Apple Silicon.

Vui, a fully-local voice assistant you can extend to do almost anything.

It packages the full loop: mic audio goes through WebRTC, speech is transcribed by faster-whisper or Moonshine, a local LLM writes the reply, and Vui Nano streams the voice back.

The big technical shift is that its context aware prosody the speech output changes depending on how you say it

Supports barge-in, sub-200ms turn latency on a 4090 GPU, memory, tool routing, and an OpenAI Realtime-compatible API.

The model was trained on 1M hours of multi speaker audio and non speech sounds, so it can handle breaths, hesitations, speaker rhythm, and tags like [breath] or [sigh], subtle disfluencies that make it sound more natural

The more interesting part is the thoughts stream, a second LLM call that silently decides whether the user’s request needs tools like Gmail, calendar, web search, shell commands, memories, or a custom agent.

Releasing vui an open source voice mode

300M TTS model

Runs on a single consumer gpu / apple sillicon

Context aware speech

6 minutes of context

21hViews 2.5KLikes 9Bookmarks 10
REPLIES2
RDB@Rajath_DB

@rohanpaul_ai local 200M tts is cool but tts was never the bottleneck for a real-time assistant. the perceptual cliff is turn detection. once vad fires on a pause mid sentence the model talks over you, and it feels broken no matter how fast the audio renders

20hViews 63Likes 1
Ines Lakzit@InesLakzit

@rohanpaul_ai Voice running real-time on a consumer GPU says a lot about where inference margins go. The commodity end of AI workloads migrates on-device and out of the cloud P&L, while pricing power concentrates in the frontier compute nobody can run locally.

20hViews 32Likes 3

@Rajath_DB @rohanpaul_ai For sure turn taking not solved really, unless I’ve missed something

20hViews 43Likes 1
Johnny Yukari@JYukariHero

@rohanpaul_ai Sub-200ms, barge-in, parallel tool routing. All local, all open source. This is the voice stack to beat.

9hViews 9Likes 1
RDB@Rajath_DB

@rohanpaul_ai you havent missed anything. vad-only endpointing tops out fast. smart turn v3 helps by going semantic instead of pure silence, but it still fires early on a held pause mid sentence. real endpointing needs to know you arent done thinking, and nobody has that clean yet

20hViews 2