5h ago

Cohere releases Command A+, its most powerful large language model to date, as open-source Apache 2.0 software that runs on two H100 GPUs with 30% lower latency

Cohere co-founder Ivan Zhang highlighted efficiency and accessibility design choices.

0
Original post

Introducing: Cohere Command A+ We’ve created our most powerful LLM yet, optimized it to run on as little hardware as possible, and released it open-source for all.

8:26 AM · May 20, 2026 View on X
Reposted by

Cohere is on such a great open-source trajectory lately. Beautiful Apache 2.0 model! https://huggingface.co/CohereLabs/command-a-plus-05-2026-bf16

Nick FrosstNick Frosst@nickfrosst

Command A+ from @cohere is out now :) its our best model yet and its open source apache 2.0

4:18 PM · May 20, 2026 · 32.6K Views
7:22 PM · May 20, 2026 · 12.3K Views

Our first fully open source Apache 2 model :)

CohereCohere@cohere

Introducing: Cohere Command A+ We’ve created our most powerful LLM yet, optimized it to run on as little hardware as possible, and released it open-source for all.

3:26 PM · May 20, 2026 · 96.6K Views
4:51 PM · May 20, 2026 · 4.3K Views

Nick really championed us going Apache 2 for this release and for Cohere Transcribe. Not an obvious decision and one that required many discussions. Like Nick says, I hope the model is more useful and empowering as a result.

Nick FrosstNick Frosst@nickfrosst

Command A+ from @cohere is out now :) its our best model yet and its open source apache 2.0

4:18 PM · May 20, 2026 · 32.6K Views
8:33 PM · May 20, 2026 · 1.5K Views

Very cool to see that cohere is continuing to use the parallel attention and MLP set up @AiEleuther introduced

CohereCohere@cohere

Introducing: Cohere Command A+ We’ve created our most powerful LLM yet, optimized it to run on as little hardware as possible, and released it open-source for all.

3:26 PM · May 20, 2026 · 96.6K Views
8:50 PM · May 20, 2026 · 177 Views

Check out our latest open-source model, built for efficiency, with a focus on business use-cases, available for all.

CohereCohere@cohere

Introducing: Cohere Command A+ We’ve created our most powerful LLM yet, optimized it to run on as little hardware as possible, and released it open-source for all.

3:26 PM · May 20, 2026 · 96.6K Views
7:41 PM · May 20, 2026 · 1.2K Views

What a week to start at Cohere!

CohereCohere@cohere

Introducing: Cohere Command A+ We’ve created our most powerful LLM yet, optimized it to run on as little hardware as possible, and released it open-source for all.

3:26 PM · May 20, 2026 · 96.6K Views
3:45 PM · May 20, 2026 · 622 Views

@nickfrosst @cohere hell yeah! model looks great

Nick FrosstNick Frosst@nickfrosst

Command A+ from @cohere is out now :) its our best model yet and its open source apache 2.0

4:18 PM · May 20, 2026 · 32.6K Views
4:38 PM · May 20, 2026 · 1.1K Views

Open source Command A+ model

This tech can go one of two ways. It can go the way the internet and mobile phones did - in which technological hegemony resulted in a mostly disempowering tech.

Or it can empower the people that use it.

We are working towards that second one.

CohereCohere@cohere

Introducing: Cohere Command A+ We’ve created our most powerful LLM yet, optimized it to run on as little hardware as possible, and released it open-source for all.

3:26 PM · May 20, 2026 · 96.6K Views
4:12 PM · May 20, 2026 · 1.7K Views

> be cohere > join forces with some German companies > immediately open source your best model > life is good

Maziyar PANAHIMaziyar PANAHI@MaziyarPanahi

wait… did Cohere just release Command A+ models under Apache 2.0 for the first time ever?! 🙊 welcome to Europe! 🤗

7:47 PM · May 20, 2026 · 10K Views
8:35 PM · May 20, 2026 · 990 Views

Command A+ from @cohere is out now :) its our best model yet and its open source apache 2.0

4:18 PM · May 20, 2026 · 32.6K Views

Cohere dropped Command A+ 🔥

> 25B/219B MoE vision language model > supports 48 languages with efficient tokenizer > tool-calling/agentic + 128k context window > transformers day-0 support 🤗 free license 💗

3:56 PM · May 20, 2026 · 2.4K Views

try their demo and check the models https://huggingface.co/collections/CohereLabs/command-a-plus

mervemerve@mervenoyann

Cohere dropped Command A+ 🔥 > 25B/219B MoE vision language model > supports 48 languages with efficient tokenizer > tool-calling/agentic + 128k context window > transformers day-0 support 🤗 free license 💗

3:56 PM · May 20, 2026 · 2.4K Views
3:57 PM · May 20, 2026 · 839 Views

Out today! Our most capable agentic model: - Runs on one B200 - 48 languages (including العربية, 日本語, 한국어) - Open source (Apache 2.0 ) - Multimodal: text + images - 218B Mixture-of-Experts model, 25B active parameters

CohereCohere@cohere

Introducing: Cohere Command A+ We’ve created our most powerful LLM yet, optimized it to run on as little hardware as possible, and released it open-source for all.

3:26 PM · May 20, 2026 · 96.6K Views
5:06 PM · May 20, 2026 · 2.6K Views

@aidangomez congrats on the release, would love to see a tech report here, lot of interesting/unusual choices

elieelie@eliebakouch

interesting open model by cohere with lots of unusual architecture choices, here is a recap: > parallel transformer, so MoE and attention are computed in parallel. likely doing some kind of MLP/attention disaggregation here? > lots of query heads, query total dim is 4x hidden size > big shared expert, 4x router size > no scaling after normalization of the top k > LayerNorm instead of RMS norm > 32 layer only, no dense layer at the start

8:36 PM · May 20, 2026 · 2.8K Views
8:49 PM · May 20, 2026 · 68 Views

interesting open model by cohere with lots of unusual architecture choices, here is a recap:

> parallel transformer, so MoE and attention are computed in parallel. likely doing some kind of MLP/attention disaggregation here? > lots of query heads, query total dim is 4x hidden size > big shared expert, 4x router size > no scaling after normalization of the top k > LayerNorm instead of RMS norm > 32 layer only, no dense layer at the start

CohereCohere@cohere

Introducing: Cohere Command A+ We’ve created our most powerful LLM yet, optimized it to run on as little hardware as possible, and released it open-source for all.

3:26 PM · May 20, 2026 · 96.6K Views
8:36 PM · May 20, 2026 · 2.8K Views

@torchcompiled yeah actually at training it's pretty nice as well you're right

EthanEthan@torchcompiled

@eliebakouch Have seen a few parallelized MLP and attention cases, I wanna say one of the flux/SD models did that? My understanding is it’s a small perf hit but with proper parallleization can be a cozy inference and training speed up

8:46 PM · May 20, 2026 · 108 Views
8:48 PM · May 20, 2026 · 91 Views

@aidangomez A M A Z I N G !!!!!!

Aidan GomezAidan Gomez@aidangomez

Our first fully open source Apache 2 model :)

4:51 PM · May 20, 2026 · 4.3K Views
5:01 PM · May 20, 2026 · 238 Views

@1vnzh @aidangomez real ones know ✊

Ivan ZhangIvan Zhang@1vnzh

@aidangomez transcribe erasure bro

4:52 PM · May 20, 2026 · 339 Views
5:26 PM · May 20, 2026 · 75 Views

4 shared experts with 8 routed experts active? so 12/132, that's crazy, i wonder why. most papers like Towards Greater Leverage would suggest 1 shared expert or minimal (i think we should decouple shared expert size anyway eventually)

also, 128 attention heads with GQA???

5:24 PM · May 20, 2026 · 5.7K Views

We ❤️ Open Source

CohereCohere@cohere

Introducing: Cohere Command A+ We’ve created our most powerful LLM yet, optimized it to run on as little hardware as possible, and released it open-source for all.

3:26 PM · May 20, 2026 · 96.6K Views
3:40 PM · May 20, 2026 · 1.8K Views

@aidangomez transcribe erasure bro

Aidan GomezAidan Gomez@aidangomez

Our first fully open source Apache 2 model :)

4:51 PM · May 20, 2026 · 4.3K Views
4:52 PM · May 20, 2026 · 339 Views

@eliebakouch Have seen a few parallelized MLP and attention cases, I wanna say one of the flux/SD models did that? My understanding is it’s a small perf hit but with proper parallleization can be a cozy inference and training speed up

elieelie@eliebakouch

interesting open model by cohere with lots of unusual architecture choices, here is a recap: > parallel transformer, so MoE and attention are computed in parallel. likely doing some kind of MLP/attention disaggregation here? > lots of query heads, query total dim is 4x hidden size > big shared expert, 4x router size > no scaling after normalization of the top k > LayerNorm instead of RMS norm > 32 layer only, no dense layer at the start

8:36 PM · May 20, 2026 · 2.8K Views
8:46 PM · May 20, 2026 · 108 Views
Cohere releases Command A+, its most powerful large language model to date, as open-source Apache 2.0 software that runs on two H100 GPUs with 30% lower latency · Digg