17h ago

Cohere releases Command A+, its most powerful large language model to date, as open-source Apache 2.0 software that runs on two H100 GPUs with 30% lower latency

Cohere co-founder Ivan Zhang highlighted efficiency and accessibility design choices.

0
Original post

Introducing: Cohere Command A+ We’ve created our most powerful LLM yet, optimized it to run on as little hardware as possible, and released it open-source for all.

8:26 AM · May 20, 2026 View on X
Reposted by

Cohere is on such a great open-source trajectory lately. Beautiful Apache 2.0 model! https://huggingface.co/CohereLabs/command-a-plus-05-2026-bf16

Nick FrosstNick Frosst@nickfrosst

Command A+ from @cohere is out now :) its our best model yet and its open source apache 2.0

4:18 PM · May 20, 2026 · 96.4K Views
7:22 PM · May 20, 2026 · 24.3K Views

@nickfrosst @cohere Congratulations on this milestone! Looks great 👍

Nick FrosstNick Frosst@nickfrosst

Command A+ from @cohere is out now :) its our best model yet and its open source apache 2.0

4:18 PM · May 20, 2026 · 96.4K Views
9:36 PM · May 20, 2026 · 469 Views

It's been *almost* a bit quiet around LLM architecture releases in the past two weeks 😅

Interesting tidbit is the parallel block design. Via the Cmd-A the tech report "equivalent performance but significant improvement in throughput compared to the vanilla transformer block."

CohereCohere@cohere

Introducing: Cohere Command A+ We’ve created our most powerful LLM yet, optimized it to run on as little hardware as possible, and released it open-source for all.

3:26 PM · May 20, 2026 · 414.1K Views
11:26 PM · May 20, 2026 · 30.1K Views

@eliebakouch It’s interesting. I’ve seen hidden -> wider -> hidden FFNs and hidden -> narrower -> hidden FFNs. But Command A+ seems tu use 4096 -> 4096 -> 4096 for each expert FFN, which I haven’t seen before (as far as I remember)

elieelie@eliebakouch

@rasbt the query head x head dim being super wide is quite different from other model as well afaik

12:05 AM · May 21, 2026 · 676 Views
12:21 AM · May 21, 2026 · 553 Views

Our first fully open source Apache 2 model :)

CohereCohere@cohere

Introducing: Cohere Command A+ We’ve created our most powerful LLM yet, optimized it to run on as little hardware as possible, and released it open-source for all.

3:26 PM · May 20, 2026 · 414.1K Views
4:51 PM · May 20, 2026 · 7.5K Views

Nick really championed us going Apache 2 for this release and for Cohere Transcribe. Not an obvious decision and one that required many discussions. Like Nick says, I hope the model is more useful and empowering as a result.

Nick FrosstNick Frosst@nickfrosst

Command A+ from @cohere is out now :) its our best model yet and its open source apache 2.0

4:18 PM · May 20, 2026 · 96.4K Views
8:33 PM · May 20, 2026 · 7.3K Views

Very cool to see that cohere is continuing to use the parallel attention and MLP set up @AiEleuther introduced

CohereCohere@cohere

Introducing: Cohere Command A+ We’ve created our most powerful LLM yet, optimized it to run on as little hardware as possible, and released it open-source for all.

3:26 PM · May 20, 2026 · 414.1K Views
8:50 PM · May 20, 2026 · 1.4K Views

Check out our latest open-source model, built for efficiency, with a focus on business use-cases, available for all.

CohereCohere@cohere

Introducing: Cohere Command A+ We’ve created our most powerful LLM yet, optimized it to run on as little hardware as possible, and released it open-source for all.

3:26 PM · May 20, 2026 · 414.1K Views
7:41 PM · May 20, 2026 · 2.7K Views

What a week to start at Cohere!

CohereCohere@cohere

Introducing: Cohere Command A+ We’ve created our most powerful LLM yet, optimized it to run on as little hardware as possible, and released it open-source for all.

3:26 PM · May 20, 2026 · 414.1K Views
3:45 PM · May 20, 2026 · 1.1K Views

@nickfrosst @cohere hell yeah! model looks great

Nick FrosstNick Frosst@nickfrosst

Command A+ from @cohere is out now :) its our best model yet and its open source apache 2.0

4:18 PM · May 20, 2026 · 96.4K Views
4:38 PM · May 20, 2026 · 1.8K Views

Cohere has fallen, DS-MoE shape reigns supreme

CohereCohere@cohere

Introducing: Cohere Command A+ We’ve created our most powerful LLM yet, optimized it to run on as little hardware as possible, and released it open-source for all.

3:26 PM · May 20, 2026 · 414.1K Views
11:27 PM · May 20, 2026 · 12.8K Views

@nickfrosst @cohere Very cool release. Much needed in the space.

Nick FrosstNick Frosst@nickfrosst

Command A+ from @cohere is out now :) its our best model yet and its open source apache 2.0

4:18 PM · May 20, 2026 · 96.4K Views
1:23 AM · May 21, 2026 · 407 Views

@nickfrosst @cohere great work

Nick FrosstNick Frosst@nickfrosst

Command A+ from @cohere is out now :) its our best model yet and its open source apache 2.0

4:18 PM · May 20, 2026 · 96.4K Views
9:54 PM · May 20, 2026 · 813 Views

Open source Command A+ model

This tech can go one of two ways. It can go the way the internet and mobile phones did - in which technological hegemony resulted in a mostly disempowering tech.

Or it can empower the people that use it.

We are working towards that second one.

CohereCohere@cohere

Introducing: Cohere Command A+ We’ve created our most powerful LLM yet, optimized it to run on as little hardware as possible, and released it open-source for all.

3:26 PM · May 20, 2026 · 414.1K Views
4:12 PM · May 20, 2026 · 4.1K Views

> be cohere > join forces with some German companies > immediately open source your best model > life is good

Maziyar PANAHIMaziyar PANAHI@MaziyarPanahi

wait… did Cohere just release Command A+ models under Apache 2.0 for the first time ever?! 🙊 welcome to Europe! 🤗

7:47 PM · May 20, 2026 · 29.8K Views
8:35 PM · May 20, 2026 · 8.2K Views

Command A+ from @cohere is out now :) its our best model yet and its open source apache 2.0

4:18 PM · May 20, 2026 · 96.4K Views

Cohere dropped Command A+ 🔥

> 25B/219B MoE vision language model > supports 48 languages with efficient tokenizer > tool-calling/agentic + 128k context window > transformers day-0 support 🤗 free license 💗

3:56 PM · May 20, 2026 · 3.6K Views

try their demo and check the models https://huggingface.co/collections/CohereLabs/command-a-plus

mervemerve@mervenoyann

Cohere dropped Command A+ 🔥 > 25B/219B MoE vision language model > supports 48 languages with efficient tokenizer > tool-calling/agentic + 128k context window > transformers day-0 support 🤗 free license 💗

3:56 PM · May 20, 2026 · 3.6K Views
3:57 PM · May 20, 2026 · 992 Views

Out today! Our most capable agentic model: - Runs on one B200 - 48 languages (including العربية, 日本語, 한국어) - Open source (Apache 2.0 ) - Multimodal: text + images - 218B Mixture-of-Experts model, 25B active parameters

CohereCohere@cohere

Introducing: Cohere Command A+ We’ve created our most powerful LLM yet, optimized it to run on as little hardware as possible, and released it open-source for all.

3:26 PM · May 20, 2026 · 414.1K Views
5:06 PM · May 20, 2026 · 6.3K Views

@aidangomez congrats on the release, would love to see a tech report here, lot of interesting/unusual choices

elieelie@eliebakouch

interesting open model by cohere with lots of unusual architecture choices, here is a recap: > parallel transformer, so MoE and attention are computed in parallel. likely doing some kind of MLP/attention disaggregation here? > lots of query heads, query total dim is 4x hidden size > big shared expert, 4x router size > no scaling after normalization of the top k > LayerNorm instead of RMS norm > 32 layer only, no dense layer at the start

8:36 PM · May 20, 2026 · 24.9K Views
8:49 PM · May 20, 2026 · 1.3K Views

@rasbt the query head x head dim being super wide is quite different from other model as well afaik

Sebastian RaschkaSebastian Raschka@rasbt

It's been *almost* a bit quiet around LLM architecture releases in the past two weeks 😅 Interesting tidbit is the parallel block design. Via the Cmd-A the tech report "equivalent performance but significant improvement in throughput compared to the vanilla transformer block."

11:26 PM · May 20, 2026 · 30.1K Views
12:05 AM · May 21, 2026 · 676 Views

interesting open model by cohere with lots of unusual architecture choices, here is a recap:

> parallel transformer, so MoE and attention are computed in parallel. likely doing some kind of MLP/attention disaggregation here? > lots of query heads, query total dim is 4x hidden size > big shared expert, 4x router size > no scaling after normalization of the top k > LayerNorm instead of RMS norm > 32 layer only, no dense layer at the start

CohereCohere@cohere

Introducing: Cohere Command A+ We’ve created our most powerful LLM yet, optimized it to run on as little hardware as possible, and released it open-source for all.

3:26 PM · May 20, 2026 · 414.1K Views
8:36 PM · May 20, 2026 · 24.9K Views

@torchcompiled yeah actually at training it's pretty nice as well you're right

EthanEthan@torchcompiled

@eliebakouch Have seen a few parallelized MLP and attention cases, I wanna say one of the flux/SD models did that? My understanding is it’s a small perf hit but with proper parallleization can be a cozy inference and training speed up

8:46 PM · May 20, 2026 · 773 Views
8:48 PM · May 20, 2026 · 567 Views

also they have a "use_parallel_embedding" flag in the config? engram next?

elieelie@eliebakouch

interesting open model by cohere with lots of unusual architecture choices, here is a recap: > parallel transformer, so MoE and attention are computed in parallel. likely doing some kind of MLP/attention disaggregation here? > lots of query heads, query total dim is 4x hidden size > big shared expert, 4x router size > no scaling after normalization of the top k > LayerNorm instead of RMS norm > 32 layer only, no dense layer at the start

8:36 PM · May 20, 2026 · 24.9K Views
9:15 PM · May 20, 2026 · 1K Views

@nrehiew_ yes it's like the "older version" of this basically, the one from palm ect..

whwh@nrehiew_

@eliebakouch Similar to this? This parallel arch basically halves layer count. The 1/2 weight and router stuff is different though

11:51 PM · May 20, 2026 · 419 Views
12:01 AM · May 21, 2026 · 272 Views

more pretty version of this visualization here if you are seeing this now

Sebastian RaschkaSebastian Raschka@rasbt

It's been *almost* a bit quiet around LLM architecture releases in the past two weeks 😅 Interesting tidbit is the parallel block design. Via the Cmd-A the tech report "equivalent performance but significant improvement in throughput compared to the vanilla transformer block."

11:26 PM · May 20, 2026 · 30.1K Views
12:06 AM · May 21, 2026 · 1K Views

@aidangomez A M A Z I N G !!!!!!

Aidan GomezAidan Gomez@aidangomez

Our first fully open source Apache 2 model :)

4:51 PM · May 20, 2026 · 7.5K Views
5:01 PM · May 20, 2026 · 288 Views

@1vnzh @aidangomez real ones know ✊

Ivan ZhangIvan Zhang@1vnzh

@aidangomez transcribe erasure bro

4:52 PM · May 20, 2026 · 491 Views
5:26 PM · May 20, 2026 · 104 Views

@eliebakouch Similar to this? This parallel arch basically halves layer count. The 1/2 weight and router stuff is different though

elieelie@eliebakouch

interesting open model by cohere with lots of unusual architecture choices, here is a recap: > parallel transformer, so MoE and attention are computed in parallel. likely doing some kind of MLP/attention disaggregation here? > lots of query heads, query total dim is 4x hidden size > big shared expert, 4x router size > no scaling after normalization of the top k > LayerNorm instead of RMS norm > 32 layer only, no dense layer at the start

8:36 PM · May 20, 2026 · 24.9K Views
11:51 PM · May 20, 2026 · 419 Views

4 shared experts with 8 routed experts active? so 12/132, that's crazy, i wonder why. most papers like Towards Greater Leverage would suggest 1 shared expert or minimal (i think we should decouple shared expert size anyway eventually)

also, 128 attention heads with GQA???

5:24 PM · May 20, 2026 · 11.6K Views

We ❤️ Open Source

CohereCohere@cohere

Introducing: Cohere Command A+ We’ve created our most powerful LLM yet, optimized it to run on as little hardware as possible, and released it open-source for all.

3:26 PM · May 20, 2026 · 414.1K Views
3:40 PM · May 20, 2026 · 2.6K Views

@aidangomez transcribe erasure bro

Aidan GomezAidan Gomez@aidangomez

Our first fully open source Apache 2 model :)

4:51 PM · May 20, 2026 · 7.5K Views
4:52 PM · May 20, 2026 · 491 Views

@rasbt it’s from gpt-j https://github.com/kingoflolz/mesh-transformer-jax

Sebastian RaschkaSebastian Raschka@rasbt

It's been *almost* a bit quiet around LLM architecture releases in the past two weeks 😅 Interesting tidbit is the parallel block design. Via the Cmd-A the tech report "equivalent performance but significant improvement in throughput compared to the vanilla transformer block."

11:26 PM · May 20, 2026 · 30.1K Views
8:54 AM · May 21, 2026 · 14 Views

@eliebakouch Have seen a few parallelized MLP and attention cases, I wanna say one of the flux/SD models did that? My understanding is it’s a small perf hit but with proper parallleization can be a cozy inference and training speed up

elieelie@eliebakouch

interesting open model by cohere with lots of unusual architecture choices, here is a recap: > parallel transformer, so MoE and attention are computed in parallel. likely doing some kind of MLP/attention disaggregation here? > lots of query heads, query total dim is 4x hidden size > big shared expert, 4x router size > no scaling after normalization of the top k > LayerNorm instead of RMS norm > 32 layer only, no dense layer at the start

8:36 PM · May 20, 2026 · 24.9K Views
8:46 PM · May 20, 2026 · 773 Views
Cohere releases Command A+, its most powerful large language model to date, as open-source Apache 2.0 software that runs on two H100 GPUs with 30% lower latency · Digg