/AI21h ago

Open-Source Vui Launches Local Voice Assistant With 300M TTS Model

4132113.3K

Original post

Rohan Paul@rohanpaul_ai#1031inAI

A new open-source voice stack just launched with a 200M TTS model built to run local, real-time voice assistants on consumer GPUs and Apple Silicon.

Vui, a fully-local voice assistant you can extend to do almost anything.

It packages the full loop: mic audio goes through WebRTC, speech is transcribed by faster-whisper or Moonshine, a local LLM writes the reply, and Vui Nano streams the voice back.

The big technical shift is that its context aware prosody the speech output changes depending on how you say it

Supports barge-in, sub-200ms turn latency on a 4090 GPU, memory, tool routing, and an OpenAI Realtime-compatible API.

The model was trained on 1M hours of multi speaker audio and non speech sounds, so it can handle breaths, hesitations, speaker rhythm, and tags like [breath] or [sigh], subtle disfluencies that make it sound more natural

The more interesting part is the thoughts stream, a second LLM call that silently decides whether the user’s request needs tools like Gmail, calendar, web search, shell commands, memories, or a custom agent.

Harry Coultas Blum@harrycblum

Releasing vui an open source voice mode

300M TTS model

Runs on a single consumer gpu / apple sillicon

Context aware speech

6 minutes of context

11:56 AM · Jun 5, 2026 · 2.5K Views

Sentiment

Users praise the open-source Vui local voice assistant for its sub-200ms latency, barge-in support, parallel tool routing, and fully local operation.

Pos

100.0%

Neg

0.0%

1 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

VIEWS859BOOKMARKS1LIKES4

Rohan Paul@rohanpaul_ai

Check its GIthub

http://github.com/fluxions-ai/vui

And you can directly try here:

http://fluxions.ai/talk http://fluxions.ai/speech

And official launch blog

https://fluxions.ai/blog/vui-launch/

Rohan Paul@rohanpaul_ai

A new open-source voice stack just launched with a 200M TTS model built to run local, real-time voice assistants on consumer GPUs and Apple Silicon.

Vui, a fully-local voice assistant you can extend to do almost anything.

It packages the full loop: mic audio goes through WebRTC, speech is transcribed by faster-whisper or Moonshine, a local LLM writes the reply, and Vui Nano streams the voice back.

The big technical shift is that its context aware prosody the speech output changes depending on how you say it

Supports barge-in, sub-200ms turn latency on a 4090 GPU, memory, tool routing, and an OpenAI Realtime-compatible API.

21h85941

RETWEETS2

Rohan Paul@rohanpaul_ai

A new open-source voice stack just launched with a 200M TTS model built to run local, real-time voice assistants on consumer GPUs and Apple Silicon.

Vui, a fully-local voice assistant you can extend to do almost anything.

It packages the full loop: mic audio goes through WebRTC, speech is transcribed by faster-whisper or Moonshine, a local LLM writes the reply, and Vui Nano streams the voice back.

The big technical shift is that its context aware prosody the speech output changes depending on how you say it

Supports barge-in, sub-200ms turn latency on a 4090 GPU, memory, tool routing, and an OpenAI Realtime-compatible API.

Harry Coultas Blum@harrycblum

Releasing vui an open source voice mode

300M TTS model

Runs on a single consumer gpu / apple sillicon

Context aware speech

6 minutes of context

21h2.5K910

REPLIES2

RDB@Rajath_DB

@rohanpaul_ai local 200M tts is cool but tts was never the bottleneck for a real-time assistant. the perceptual cliff is turn detection. once vad fires on a pause mid sentence the model talks over you, and it feels broken no matter how fast the audio renders

20h631

Ines Lakzit@InesLakzit

@rohanpaul_ai Voice running real-time on a consumer GPU says a lot about where inference margins go. The commodity end of AI workloads migrates on-device and out of the cloud P&L, while pricing power concentrates in the frontier compute nobody can run locally.

20h323

Harry Coultas Blum@harrycblum

@Rajath_DB @rohanpaul_ai For sure turn taking not solved really, unless I’ve missed something

20h431

Johnny Yukari@JYukariHero

@rohanpaul_ai Sub-200ms, barge-in, parallel tool routing. All local, all open source. This is the voice stack to beat.

9h91

RDB@Rajath_DB

@rohanpaul_ai you havent missed anything. vad-only endpointing tops out fast. smart turn v3 helps by going semantic instead of pure silence, but it still fires early on a held pause mid sentence. real endpointing needs to know you arent done thinking, and nobody has that clean yet

20h2