played around with building a voice agent on @cartesia this morning.
a few things stood out for anyone running support or booking lines.
it waits until you've actually finished talking before it responds.
turn detection is built into the model so it stops cutting people off mid sentence the way most agents do.
it reads phone numbers and confirmation codes back cleanly which is usually where support bots embarrass themselves.
> 100ms to first audio means no awkward gap before it answers. very hard to tell apart from human speed.
the whole stack is two models, sonic 3.5 for speech and ink 2 for transcription, both ranked #1 in their category on artificial analysis right now.
if you've got customers stuck on hold or stuck repeating their account number, worth checking them out 👇