Cartesia Ink-2 debuts as #1 for accuracy on the brand-new streaming speech-to-text leaderboard from @ArtificialAnlys! We designed Ink-2 from the ground up for voice agents - with low latency, eager transcripts, and semantic endpointing.
Cartesia releases Ink-2, a streaming speech-to-text model that tops the Artificial Analysis leaderboard for accuracy
Built-in semantic endpoints detect when a user finishes speaking
Positive users hail Cartesia Ink-2 topping the streaming speech-to-text accuracy leaderboard as an impressive breakthrough from a tiny team, while negative users note that voice agents still feel laggy in practice.
No Digg Deeper questions have been answered for this story yet.
Most Activity
With the release of Cartesia’s newest model, I can finally say:
At Cartesia, we’re listening. 👂
Cartesia Ink-2 debuts as #1 for accuracy on the brand-new streaming speech-to-text leaderboard from @ArtificialAnlys! We designed Ink-2 from the ground up for voice agents - with low latency, eager transcripts, and semantic endpointing.
Our new model Ink-2 tops AA's leaderboard for streaming speech-to-text!
Ink-2 comes with plenty of features optimized for real-time voice agents. With top-class models for both TTS and STT, the team at @cartesia keeps pushing the frontier of models for interactive intelligence.
Cartesia Ink-2 debuts as #1 for accuracy on the brand-new streaming speech-to-text leaderboard from @ArtificialAnlys! We designed Ink-2 from the ground up for voice agents - with low latency, eager transcripts, and semantic endpointing.
Cartesia Ink-2 debuts as #1 for accuracy on the brand-new streaming speech-to-text leaderboard from @ArtificialAnlys! We designed Ink-2 from the ground up for voice agents - with low latency, eager transcripts, and semantic endpointing.
🧵 on some fun insider details on ink-2 😼
Cartesia Ink-2 debuts as #1 for accuracy on the brand-new streaming speech-to-text leaderboard from @ArtificialAnlys! We designed Ink-2 from the ground up for voice agents - with low latency, eager transcripts, and semantic endpointing.
Our new speech-to-text model Ink-2 is out and #1 on Artificial Analysis.
It’s built for streaming — low latency, fast eager mode and built in semantic endpoints to detect when users are done talking
New architectures & algorithms made this Pareto-dominance possible
Cartesia Ink-2 debuts as #1 for accuracy on the brand-new streaming speech-to-text leaderboard from @ArtificialAnlys! We designed Ink-2 from the ground up for voice agents - with low latency, eager transcripts, and semantic endpointing.

@ArtificialAnlys We've built Ink-2 to excel on all of these axes in production.
Give it a try on our website https://www.cartesia.ai/ink

A great speech-to-text model for voice agents first and foremost needs to have high accuracy in production settings - this means noisy environments and conventionally difficult audio like silences, short transcripts, phone numbers, and UUIDs.
For the conversation to be smooth, it also needs to have low latency with eager transcripts to reduce end to end response time.
Finally, semantic endpointing with high accuracy is critical so they respond appropriately and don't interrupt the user.
@_albertgu @cartesia tested it out on @thursdai_pod yesterday! It's kind of incredible! Kudos
Our new model Ink-2 tops AA's leaderboard for streaming speech-to-text!
Ink-2 comes with plenty of features optimized for real-time voice agents. With top-class models for both TTS and STT, the team at @cartesia keeps pushing the frontier of models for interactive intelligence.

@KabirGoel
@_albertgu @cartesia It's really fast Albert! Will cover on @thursdai_pod 🫡
Our new model Ink-2 tops AA's leaderboard for streaming speech-to-text!
Ink-2 comes with plenty of features optimized for real-time voice agents. With top-class models for both TTS and STT, the team at @cartesia keeps pushing the frontier of models for interactive intelligence.

@chongz we gotta train a vision model so we can say, “we're your eyes and ears”

First - to fully appreciate how smart this model is, you really gotta try it out here: https://www.cartesia.ai/ink

This model is a pretty interesting new architecture for ASR, and we're excited at how smart this model feels. It's able to get hard contextual things correct that require real-world knowledge, like "Leicester City" (leh-ster) or cyclist "Tadej Pogačar" (tah-day poh-gah-char).

Ink-2 is still new and there's still a ton of research happening! Expect some of these WER and latency numbers to improve even more over the coming weeks. We're also working on multilingual and features like keyword-boosting. If you have have feedback for us, my DMs are open!

@cartesia @ArtificialAnlys whoah this is really impressive (though english only!) Will add to @thursdai_pod newsletter today

One thing AA benchmark doesn't measure yet is turn-taking. We spent a lot of time optimizing Ink-2 for this, and the latency, precision, and recall are SoTA! It's also very semantically smart - even if you pause for a while mid-sentence, it won't cut you off like other models.

@elipughresearch of course you put tadej into the eval set ...

@_albertgu @cartesia who is gu

@cartesia @ArtificialAnlys Huge breakthrough by the team 🤩

@krandiash and new branding too!