/Tech11h ago

Cohere co-founder Nick Frosst says Cohere-Transcribe took first place on the new Far-Field Automatic Speech Recognition leaderboard

The Apache 2.0-licensed model generalized to the unseen benchmark.

12263196714.4K

#34

Original post

Nick Frosst@nickfrosst#658inTech

New leaderboard for audio transcription just launched and our apache 2.0 Cohere-Transcribe is at the top. This eval didn't exist when we trained the model, so its nice to see us do so well on it.

https://huggingface.co/spaces/treble-technologies/ffasr

4:11 AM · Jun 10, 2026 · 7.3K Views

/Tech11h ago

Cohere co-founder Nick Frosst says Cohere-Transcribe took first place on the new Far-Field Automatic Speech Recognition leaderboard

The Apache 2.0-licensed model generalized to the unseen benchmark.

12263196714.4K

#34

Original post

Nick Frosst@nickfrosst#658inTech

New leaderboard for audio transcription just launched and our apache 2.0 Cohere-Transcribe is at the top. This eval didn't exist when we trained the model, so its nice to see us do so well on it.

https://huggingface.co/spaces/treble-technologies/ffasr

4:11 AM · Jun 10, 2026 · 7.3K Views

Sentiment

Users are pleased that Cohere Transcribe topped new audio transcription and far-field ASR benchmarks because the results validate prior non-trendy work, sustain momentum, and include open-sourced small models.

Pos

100.0%

Neg

0.0%

4 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

VIEWS506BOOKMARKS3LIKES10

Cohere@cohere

In March, Transcribe topped the OpenASR leaderboard for general-purpose speech recognition. Today, it leads a benchmark designed to go beyond and test robustness in real-world, far-field audio environments.

Give it a try and share back what you build: https://huggingface.co/CohereLabs/cohere-transcribe-03-2026

2h506103

RETWEETS8

Cohere@cohere

Cohere Transcribe, our open-source speech recognition model, is #1 on the new @huggingface Far-Field ASR benchmark.

2h8.1K20652

REPLIES1

Cohere@cohere

These tests measure performance in varying signal-to-noise conditions: the kinds of audio found in meeting rooms, contact centres, & phone calls.

In other words, environments where enterprise speech applications actually operate. Cohere Transcribe ranked #1 across every metric:

2h3633

Cohere@cohere

Transcribe achieved a 17.9 WER - nearly 2 points ahead of IBM Granite Speech and 3.6 points ahead of NVIDIA’s Parakeet.

Still Apache 2.0 and runs on your laptop. Enterprise performance 🤝 developer ergonomics.

Full results: https://huggingface.co/spaces/treble-technologies/ffasr

2h422101

Cohere@cohere

Want to learn more about the Far-Field ASR benchmark?

Join Treble's webinar tomorrow, June 11th, with @shinjiw_at_cmu, Cohere's @Julianfmack, and other industry leaders discussing the future of far-field speech recognition.

2h3363

Latent Local@latentlocal

@nickfrosst Nice, keeping that momentum after 30b.

8h391

Tuning Engines:Enforcement Layer for Enterprise AI@tuning_engines

@cohere @huggingface You folks are cooking! Thanks for OSS'ing it. We will add to models we host

2h381

Furkan Gözükara@FurkanGozukara

@cohere @victormustar @huggingface Can it make word level timestamps or not? If not what is the trained durations that it will generate sentence longs

Block text not useful as subtitle

1h88

Isabelle Plante@Izzyplante

@nickfrosst Glad to see that Cohere is dropping small open sourced models like this

7h37

Simon Bissonnette@SimonBiss

@cohere @huggingface Might try this out to help a call center in Quebec, hope it will work well in French. What kind of hardware would be required to have near real-time translation for, let's say, 10 to 20 users? And if we use the API, can you guarantee the confidential handling of information?

1h31

The Weird Canadian@Weird_Canadian

@cohere @huggingface And how much did this cost Canadian taxpayers to build?

2h31

Guilherme O'Tina@guilhermeotina

the margin grows as conditions get worse. in low snr cohere is ~4 WER points ahead of ibm, but in near field theyre basically tied. the asymmetric encoder/decoder split makes sense for this: most compute goes to acoustic features, and the light decoder keeps latency down. curious how this runs on device vs cloud

2h10

Rugbist@rugbist_

@nickfrosst kind of satisfying when the metric catches up to the work already done.

proof you built the right thing, not the trendy thing.

11h2

Alex YGift@Radipdegen

@nickfrosst undetected ecov did transcribe audio prior and the apache 2.0 is more stronger

11h