Cartesia releases Ink-2, a streaming speech-to-text model that tops the Artificial Analysis leaderboard for accuracy

VIEWS13.9K

kabir@KabirGoel

With the release of Cartesia’s newest model, I can finally say:

At Cartesia, we’re listening. 👂

Cartesia Ink-2 debuts as #1 for accuracy on the brand-new streaming speech-to-text leaderboard from @ArtificialAnlys! We designed Ink-2 from the ground up for voice agents - with low latency, eager transcripts, and semantic endpointing.

32d13.9K508

BOOKMARKS18LIKES93REPLIES6

Albert Gu@_albertgu

Our new model Ink-2 tops AA's leaderboard for streaming speech-to-text!

Ink-2 comes with plenty of features optimized for real-time voice agents. With top-class models for both TTS and STT, the team at @cartesia keeps pushing the frontier of models for interactive intelligence.

Cartesia@cartesia

Cartesia Ink-2 debuts as #1 for accuracy on the brand-new streaming speech-to-text leaderboard from @ArtificialAnlys! We designed Ink-2 from the ground up for voice agents - with low latency, eager transcripts, and semantic endpointing.

32d10.6K9318

RETWEETS34

Cartesia@cartesia

Cartesia Ink-2 debuts as #1 for accuracy on the brand-new streaming speech-to-text leaderboard from @ArtificialAnlys! We designed Ink-2 from the ground up for voice agents - with low latency, eager transcripts, and semantic endpointing.

32d53.8K12049

Eli@elipughresearch

🧵 on some fun insider details on ink-2 😼

Cartesia@cartesia

Cartesia Ink-2 debuts as #1 for accuracy on the brand-new streaming speech-to-text leaderboard from @ArtificialAnlys! We designed Ink-2 from the ground up for voice agents - with low latency, eager transcripts, and semantic endpointing.

32d4.4K437

Karan Goel@krandiash

Our new speech-to-text model Ink-2 is out and #1 on Artificial Analysis.

It’s built for streaming — low latency, fast eager mode and built in semantic endpoints to detect when users are done talking

New architectures & algorithms made this Pareto-dominance possible

Cartesia@cartesia

Cartesia Ink-2 debuts as #1 for accuracy on the brand-new streaming speech-to-text leaderboard from @ArtificialAnlys! We designed Ink-2 from the ground up for voice agents - with low latency, eager transcripts, and semantic endpointing.

32d4.4K423

Cartesia@cartesia

@ArtificialAnlys We've built Ink-2 to excel on all of these axes in production.

Give it a try on our website https://www.cartesia.ai/ink

32d906182

Cartesia@cartesia

A great speech-to-text model for voice agents first and foremost needs to have high accuracy in production settings - this means noisy environments and conventionally difficult audio like silences, short transcripts, phone numbers, and UUIDs.

For the conversation to be smooth, it also needs to have low latency with eager transcripts to reduce end to end response time.

Finally, semantic endpointing with high accuracy is critical so they respond appropriately and don't interrupt the user.

32d1.1K16

Alex Volkov @ AI Engineer@altryne

@_albertgu @cartesia tested it out on @thursdai_pod yesterday! It's kind of incredible! Kudos

Albert Gu@_albertgu

Our new model Ink-2 tops AA's leaderboard for streaming speech-to-text!

Ink-2 comes with plenty of features optimized for real-time voice agents. With top-class models for both TTS and STT, the team at @cartesia keeps pushing the frontier of models for interactive intelligence.

32d1.7K51

Timothy Luong (Chongz)@chongz

@KabirGoel

32d1696

Alex Volkov @ AI Engineer@altryne

@_albertgu @cartesia It's really fast Albert! Will cover on @thursdai_pod 🫡

Albert Gu@_albertgu

Our new model Ink-2 tops AA's leaderboard for streaming speech-to-text!

Ink-2 comes with plenty of features optimized for real-time voice agents. With top-class models for both TTS and STT, the team at @cartesia keeps pushing the frontier of models for interactive intelligence.

32d72701

kabir@KabirGoel

@chongz we gotta train a vision model so we can say, “we're your eyes and ears”

32d1225

Eli@elipughresearch

First - to fully appreciate how smart this model is, you really gotta try it out here: https://www.cartesia.ai/ink

32d341

Eli@elipughresearch

This model is a pretty interesting new architecture for ASR, and we're excited at how smart this model feels. It's able to get hard contextual things correct that require real-world knowledge, like "Leicester City" (leh-ster) or cyclist "Tadej Pogačar" (tah-day poh-gah-char).

32d30

Eli@elipughresearch

Ink-2 is still new and there's still a ton of research happening! Expect some of these WER and latency numbers to improve even more over the coming weeks. We're also working on multilingual and features like keyword-boosting. If you have have feedback for us, my DMs are open!

32d381

Alex Volkov@altryne

@cartesia @ArtificialAnlys whoah this is really impressive (though english only!) Will add to @thursdai_pod newsletter today

32d1.1K2

Eli@elipughresearch

One thing AA benchmark doesn't measure yet is turn-taking. We spent a lot of time optimizing Ink-2 for this, and the latency, precision, and recall are SoTA! It's also very semantically smart - even if you pause for a while mid-sentence, it won't cut you off like other models.

32d261