5h ago

Leaked architecture details show Microsoft trained multiple 600B-parameter MoE models before finalizing its 1T-parameter MAI-Base-1

MAI-Base-1 utilizes 35B active parameters across 512 experts

Sentiment

Pos0%

Neg100%

Some users sarcastically dismissed Microsoft training three DeepSeek-V3 sized models as having copied the wrong scale to emulate.

1 comment with sentiment.

Leaked architecture details show Microsoft trained multiple 600B-parameter MoE models before finalizing its 1T-parameter MAI-Base-1 · Digg