2d ago

Sebastian Raschka ranks 28 LLMs by active parameters

274673612075.6K

——0——

Sebastian Raschka posted a table from his LLM Architecture Gallery that ranks 28 large language models by active parameters per token. DeepSeek V4-Pro leads at 3.1 percent active parameters, followed by Kimi K2 variants at 3.2 percent and Qwen3 80B-A3B at 3.8 percent. The table shows active versus total parameter counts, model type, attention mechanism, and release dates. It offers one comparative view for sparse models while omitting KV cache size, routing overhead, context length, and hardware efficiency.

Original post

Sebastian Raschka#162@RASBT

Meta observation: DeepSeek is still king of the active-parameter ratio

7:46 AM · May 14, 2026

Cluster Engagement

Engagement snapshots are unavailable for this cluster.no post metric buckets

ORIGINAL POST

#162Sebastian Raschka@RASBT

Meta observation: DeepSeek is still king of the active-parameter ratio

2:46 PM · May 14, 2026 · 43.6K Views

#162Sebastian Raschka@RASBT

The table in HTML format for easier (and non-truncated) viewing: https://sebastianraschka.com/llm-architecture-gallery/active-parameter-ratio/

Sebastian Raschka@rasbt

Meta observation: DeepSeek is still king of the active-parameter ratio

2:46 PM · May 14, 2026 · 43.6K Views

3:37 PM · May 14, 2026 · 5.7K Views

#162Sebastian Raschka@RASBT

@teortaxesTex good catch, will add

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex

@rasbt Xiaomi 2.5 pro missing (4.2%, 1T)

8:55 PM · May 14, 2026 · 920 Views

9:08 PM · May 14, 2026 · 656 Views

#162Sebastian Raschka@RASBT

@scaling01 This is a truncated one... but Google was on the bottom of the list

Lisan al Gaib@scaling01

I wouldn't be surprised if Google was at 1-2% active

8:50 PM · May 14, 2026 · 20.2K Views

9:09 PM · May 14, 2026 · 3.3K Views

#400Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@TEORTAXESTEX

@rasbt Xiaomi 2.5 pro missing (4.2%, 1T)

Sebastian Raschka@rasbt

Meta observation: DeepSeek is still king of the active-parameter ratio

2:46 PM · May 14, 2026 · 43.6K Views

8:55 PM · May 14, 2026 · 920 Views

#897Alexander Doria@DORIALEXANDER

@scaling01 Likely all the incoming generation of frontier models. Extreme sparsity is economic viability.

Lisan al Gaib@scaling01

I wouldn't be surprised if Google was at 1-2% active

8:50 PM · May 14, 2026 · 20.2K Views

8:57 PM · May 14, 2026 · 533 Views

QUOTE POST

#984Lisan al Gaib@SCALING01

I wouldn't be surprised if Google was at 1-2% active

Sebastian Raschka@rasbt

Meta observation: DeepSeek is still king of the active-parameter ratio

2:46 PM · May 14, 2026 · 43.6K Views

8:50 PM · May 14, 2026 · 20.2K Views

#984Lisan al Gaib@SCALING01

@rasbt I mean the closed Gemini 3 models

Sebastian Raschka@rasbt

@scaling01 This is a truncated one... but Google was on the bottom of the list

9:09 PM · May 14, 2026 · 3.3K Views

9:24 PM · May 14, 2026 · 739 Views