Cohere releases Command A+, its most powerful large language model to date, as open-source Apache 2.0 software that runs on two H100 GPUs with 30% lower latency

QUOTE POST

Cohere is on such a great open-source trajectory lately. Beautiful Apache 2.0 model! https://huggingface.co/CohereLabs/command-a-plus-05-2026-bf16

Nick Frosst@nickfrosst

Command A+ from @cohere is out now :) its our best model yet and its open source apache 2.0

4:18 PM · May 20, 2026 · 96.4K Views

7:22 PM · May 20, 2026 · 24.3K Views

REPLY

#103Delip Rao e/σ@DELIPRAO

@nickfrosst @cohere Congratulations on this milestone! Looks great 👍

Nick Frosst@nickfrosst

Command A+ from @cohere is out now :) its our best model yet and its open source apache 2.0

4:18 PM · May 20, 2026 · 96.4K Views

9:36 PM · May 20, 2026 · 469 Views

QUOTE POST

#155Sebastian Raschka@RASBT

It's been *almost* a bit quiet around LLM architecture releases in the past two weeks 😅

Interesting tidbit is the parallel block design. Via the Cmd-A the tech report "equivalent performance but significant improvement in throughput compared to the vanilla transformer block."

Cohere@cohere

Introducing: Cohere Command A+ We’ve created our most powerful LLM yet, optimized it to run on as little hardware as possible, and released it open-source for all.

3:26 PM · May 20, 2026 · 414.1K Views

11:26 PM · May 20, 2026 · 30.1K Views

REPLY

#155Sebastian Raschka@RASBT

@eliebakouch It’s interesting. I’ve seen hidden -> wider -> hidden FFNs and hidden -> narrower -> hidden FFNs. But Command A+ seems tu use 4096 -> 4096 -> 4096 for each expert FFN, which I haven’t seen before (as far as I remember)

elie@eliebakouch

@rasbt the query head x head dim being super wide is quite different from other model as well afaik

12:05 AM · May 21, 2026 · 676 Views

12:21 AM · May 21, 2026 · 553 Views

QUOTE POST

#171Aidan Gomez@AIDANGOMEZ

Our first fully open source Apache 2 model :)

Cohere@cohere

Introducing: Cohere Command A+ We’ve created our most powerful LLM yet, optimized it to run on as little hardware as possible, and released it open-source for all.

3:26 PM · May 20, 2026 · 414.1K Views

4:51 PM · May 20, 2026 · 7.5K Views

QUOTE POST

#171Aidan Gomez@AIDANGOMEZ

Nick really championed us going Apache 2 for this release and for Cohere Transcribe. Not an obvious decision and one that required many discussions. Like Nick says, I hope the model is more useful and empowering as a result.

Nick Frosst@nickfrosst

Command A+ from @cohere is out now :) its our best model yet and its open source apache 2.0

4:18 PM · May 20, 2026 · 96.4K Views

8:33 PM · May 20, 2026 · 7.3K Views

QUOTE POST

#208Stella Biderman @ ICLR@BLANCHEMINERVA

Very cool to see that cohere is continuing to use the parallel attention and MLP set up @AiEleuther introduced

Cohere@cohere

Introducing: Cohere Command A+ We’ve created our most powerful LLM yet, optimized it to run on as little hardware as possible, and released it open-source for all.

3:26 PM · May 20, 2026 · 414.1K Views

8:50 PM · May 20, 2026 · 1.4K Views

QUOTE POST

#216Joelle Pineau@JPINEAU1

Check out our latest open-source model, built for efficiency, with a focus on business use-cases, available for all.

Cohere@cohere

Introducing: Cohere Command A+ We’ve created our most powerful LLM yet, optimized it to run on as little hardware as possible, and released it open-source for all.

3:26 PM · May 20, 2026 · 414.1K Views

7:41 PM · May 20, 2026 · 2.7K Views

QUOTE POST

#258Marc G. Bellemare@MARCGBELLEMARE

What a week to start at Cohere!

Cohere@cohere

Introducing: Cohere Command A+ We’ve created our most powerful LLM yet, optimized it to run on as little hardware as possible, and released it open-source for all.

3:26 PM · May 20, 2026 · 414.1K Views

3:45 PM · May 20, 2026 · 1.1K Views

REPLY

#339will brown@WILLCCBB

@nickfrosst @cohere hell yeah! model looks great

Nick Frosst@nickfrosst

Command A+ from @cohere is out now :) its our best model yet and its open source apache 2.0

4:18 PM · May 20, 2026 · 96.4K Views

4:38 PM · May 20, 2026 · 1.8K Views

QUOTE POST

#420Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@TEORTAXESTEX

Cohere has fallen, DS-MoE shape reigns supreme

Cohere@cohere

Introducing: Cohere Command A+ We’ve created our most powerful LLM yet, optimized it to run on as little hardware as possible, and released it open-source for all.

3:26 PM · May 20, 2026 · 414.1K Views

11:27 PM · May 20, 2026 · 12.8K Views

REPLY

#475elvis@OMARSAR0

@nickfrosst @cohere Very cool release. Much needed in the space.

Nick Frosst@nickfrosst

Command A+ from @cohere is out now :) its our best model yet and its open source apache 2.0

4:18 PM · May 20, 2026 · 96.4K Views

1:23 AM · May 21, 2026 · 407 Views

REPLY

#488kache@YACINEMTB

@nickfrosst @cohere great work

Nick Frosst@nickfrosst

Command A+ from @cohere is out now :) its our best model yet and its open source apache 2.0

4:18 PM · May 20, 2026 · 96.4K Views

9:54 PM · May 20, 2026 · 813 Views

QUOTE POST

#601Nick Frosst@NICKFROSST

Open source Command A+ model

This tech can go one of two ways. It can go the way the internet and mobile phones did - in which technological hegemony resulted in a mostly disempowering tech.

Or it can empower the people that use it.

We are working towards that second one.

Cohere@cohere

Introducing: Cohere Command A+ We’ve created our most powerful LLM yet, optimized it to run on as little hardware as possible, and released it open-source for all.

3:26 PM · May 20, 2026 · 414.1K Views

4:12 PM · May 20, 2026 · 4.1K Views

QUOTE POST

#601Nick Frosst@NICKFROSST

> be cohere > join forces with some German companies > immediately open source your best model > life is good

Maziyar PANAHI@MaziyarPanahi

wait… did Cohere just release Command A+ models under Apache 2.0 for the first time ever?! 🙊 welcome to Europe! 🤗

7:47 PM · May 20, 2026 · 29.8K Views

8:35 PM · May 20, 2026 · 8.2K Views

POST

#601Nick Frosst@NICKFROSST

Command A+ from @cohere is out now :) its our best model yet and its open source apache 2.0

4:18 PM · May 20, 2026 · 96.4K Views

POST

#675merve@MERVENOYANN

Cohere dropped Command A+ 🔥

> 25B/219B MoE vision language model > supports 48 languages with efficient tokenizer > tool-calling/agentic + 128k context window > transformers day-0 support 🤗 free license 💗

3:56 PM · May 20, 2026 · 3.6K Views

REPLY

#675merve@MERVENOYANN

try their demo and check the models https://huggingface.co/collections/CohereLabs/command-a-plus

merve@mervenoyann

Cohere dropped Command A+ 🔥 > 25B/219B MoE vision language model > supports 48 languages with efficient tokenizer > tool-calling/agentic + 128k context window > transformers day-0 support 🤗 free license 💗

3:56 PM · May 20, 2026 · 3.6K Views

3:57 PM · May 20, 2026 · 992 Views

QUOTE POST

#715Jay Alammar@JAYALAMMAR

Out today! Our most capable agentic model: - Runs on one B200 - 48 languages (including العربية, 日本語, 한국어) - Open source (Apache 2.0 ) - Multimodal: text + images - 218B Mixture-of-Experts model, 25B active parameters

Cohere@cohere

Introducing: Cohere Command A+ We’ve created our most powerful LLM yet, optimized it to run on as little hardware as possible, and released it open-source for all.

3:26 PM · May 20, 2026 · 414.1K Views

5:06 PM · May 20, 2026 · 6.3K Views

QUOTE POST

#716elie@ELIEBAKOUCH

@aidangomez congrats on the release, would love to see a tech report here, lot of interesting/unusual choices

elie@eliebakouch

interesting open model by cohere with lots of unusual architecture choices, here is a recap: > parallel transformer, so MoE and attention are computed in parallel. likely doing some kind of MLP/attention disaggregation here? > lots of query heads, query total dim is 4x hidden size > big shared expert, 4x router size > no scaling after normalization of the top k > LayerNorm instead of RMS norm > 32 layer only, no dense layer at the start

8:36 PM · May 20, 2026 · 24.9K Views

8:49 PM · May 20, 2026 · 1.3K Views

REPLY

#716elie@ELIEBAKOUCH

@rasbt the query head x head dim being super wide is quite different from other model as well afaik

Sebastian Raschka@rasbt

It's been *almost* a bit quiet around LLM architecture releases in the past two weeks 😅 Interesting tidbit is the parallel block design. Via the Cmd-A the tech report "equivalent performance but significant improvement in throughput compared to the vanilla transformer block."

11:26 PM · May 20, 2026 · 30.1K Views

12:05 AM · May 21, 2026 · 676 Views

QUOTE POST

#716elie@ELIEBAKOUCH

interesting open model by cohere with lots of unusual architecture choices, here is a recap:

> parallel transformer, so MoE and attention are computed in parallel. likely doing some kind of MLP/attention disaggregation here? > lots of query heads, query total dim is 4x hidden size > big shared expert, 4x router size > no scaling after normalization of the top k > LayerNorm instead of RMS norm > 32 layer only, no dense layer at the start

Cohere@cohere

Introducing: Cohere Command A+ We’ve created our most powerful LLM yet, optimized it to run on as little hardware as possible, and released it open-source for all.

3:26 PM · May 20, 2026 · 414.1K Views

8:36 PM · May 20, 2026 · 24.9K Views

REPLY

#716elie@ELIEBAKOUCH

@torchcompiled yeah actually at training it's pretty nice as well you're right

Ethan@torchcompiled

@eliebakouch Have seen a few parallelized MLP and attention cases, I wanna say one of the flux/SD models did that? My understanding is it’s a small perf hit but with proper parallleization can be a cozy inference and training speed up

8:46 PM · May 20, 2026 · 773 Views

8:48 PM · May 20, 2026 · 567 Views

REPLY

#716elie@ELIEBAKOUCH

also they have a "use_parallel_embedding" flag in the config? engram next?

elie@eliebakouch

interesting open model by cohere with lots of unusual architecture choices, here is a recap: > parallel transformer, so MoE and attention are computed in parallel. likely doing some kind of MLP/attention disaggregation here? > lots of query heads, query total dim is 4x hidden size > big shared expert, 4x router size > no scaling after normalization of the top k > LayerNorm instead of RMS norm > 32 layer only, no dense layer at the start

8:36 PM · May 20, 2026 · 24.9K Views

9:15 PM · May 20, 2026 · 1K Views

REPLY

#716elie@ELIEBAKOUCH

@nrehiew_ yes it's like the "older version" of this basically, the one from palm ect..

wh@nrehiew_

@eliebakouch Similar to this? This parallel arch basically halves layer count. The 1/2 weight and router stuff is different though

11:51 PM · May 20, 2026 · 419 Views

12:01 AM · May 21, 2026 · 272 Views

QUOTE POST

#716elie@ELIEBAKOUCH

more pretty version of this visualization here if you are seeing this now

Sebastian Raschka@rasbt

It's been *almost* a bit quiet around LLM architecture releases in the past two weeks 😅 Interesting tidbit is the parallel block design. Via the Cmd-A the tech report "equivalent performance but significant improvement in throughput compared to the vanilla transformer block."

11:26 PM · May 20, 2026 · 30.1K Views

12:06 AM · May 21, 2026 · 1K Views

REPLY

#1153Florian Brand@XEOPHON

@aidangomez A M A Z I N G !!!!!!

Aidan Gomez@aidangomez

Our first fully open source Apache 2 model :)

4:51 PM · May 20, 2026 · 7.5K Views

5:01 PM · May 20, 2026 · 288 Views

REPLY

#1153Florian Brand@XEOPHON

@1vnzh @aidangomez real ones know ✊

Ivan Zhang@1vnzh

@aidangomez transcribe erasure bro

4:52 PM · May 20, 2026 · 491 Views

5:26 PM · May 20, 2026 · 104 Views

REPLY

#1430wh@NREHIEW_

@eliebakouch Similar to this? This parallel arch basically halves layer count. The 1/2 weight and router stuff is different though

elie@eliebakouch

interesting open model by cohere with lots of unusual architecture choices, here is a recap: > parallel transformer, so MoE and attention are computed in parallel. likely doing some kind of MLP/attention disaggregation here? > lots of query heads, query total dim is 4x hidden size > big shared expert, 4x router size > no scaling after normalization of the top k > LayerNorm instead of RMS norm > 32 layer only, no dense layer at the start

8:36 PM · May 20, 2026 · 24.9K Views

11:51 PM · May 20, 2026 · 419 Views

QUOTE POST

#1629stochasm@STOCHASTICCHASM

4 shared experts with 8 routed experts active? so 12/132, that's crazy, i wonder why. most papers like Towards Greater Leverage would suggest 1 shared expert or minimal (i think we should decouple shared expert size anyway eventually)

also, 128 attention heads with GQA???

5:24 PM · May 20, 2026 · 11.6K Views

QUOTE POST

#1634Ivan Zhang@1VNZH

We ❤️ Open Source

Cohere@cohere

Introducing: Cohere Command A+ We’ve created our most powerful LLM yet, optimized it to run on as little hardware as possible, and released it open-source for all.

3:26 PM · May 20, 2026 · 414.1K Views

3:40 PM · May 20, 2026 · 2.6K Views

REPLY

#1634Ivan Zhang@1VNZH

@aidangomez transcribe erasure bro

Aidan Gomez@aidangomez

Our first fully open source Apache 2 model :)

4:51 PM · May 20, 2026 · 7.5K Views

4:52 PM · May 20, 2026 · 491 Views

REPLY

#1709Lucas Nestler@CLASHLUKE

@rasbt it’s from gpt-j https://github.com/kingoflolz/mesh-transformer-jax

Sebastian Raschka@rasbt

It's been *almost* a bit quiet around LLM architecture releases in the past two weeks 😅 Interesting tidbit is the parallel block design. Via the Cmd-A the tech report "equivalent performance but significant improvement in throughput compared to the vanilla transformer block."

11:26 PM · May 20, 2026 · 30.1K Views

8:54 AM · May 21, 2026 · 14 Views

REPLY

#1884Ethan@TORCHCOMPILED

@eliebakouch Have seen a few parallelized MLP and attention cases, I wanna say one of the flux/SD models did that? My understanding is it’s a small perf hit but with proper parallleization can be a cozy inference and training speed up

elie@eliebakouch

interesting open model by cohere with lots of unusual architecture choices, here is a recap: > parallel transformer, so MoE and attention are computed in parallel. likely doing some kind of MLP/attention disaggregation here? > lots of query heads, query total dim is 4x hidden size > big shared expert, 4x router size > no scaling after normalization of the top k > LayerNorm instead of RMS norm > 32 layer only, no dense layer at the start

8:36 PM · May 20, 2026 · 24.9K Views

8:46 PM · May 20, 2026 · 773 Views

Cohere releases Command A+, its most powerful large language model to date, as open-source Apache 2.0 software that runs on two H100 GPUs with 30% lower latency

Sentiment

Cluster engagement