A new open-source voice stack just launched with a 200M TTS model built to run local, real-time voice assistants on consumer GPUs and Apple Silicon.
Vui, a fully-local voice assistant you can extend to do almost anything.
It packages the full loop: mic audio goes through WebRTC, speech is transcribed by faster-whisper or Moonshine, a local LLM writes the reply, and Vui Nano streams the voice back.
The big technical shift is that its context aware prosody the speech output changes depending on how you say it
Supports barge-in, sub-200ms turn latency on a 4090 GPU, memory, tool routing, and an OpenAI Realtime-compatible API.
The model was trained on 1M hours of multi speaker audio and non speech sounds, so it can handle breaths, hesitations, speaker rhythm, and tags like [breath] or [sigh], subtle disfluencies that make it sound more natural
The more interesting part is the thoughts stream, a second LLM call that silently decides whether the user’s request needs tools like Gmail, calendar, web search, shell commands, memories, or a custom agent.
Releasing vui an open source voice mode
300M TTS model
Runs on a single consumer gpu / apple sillicon
Context aware speech
6 minutes of context



