/AI5h ago

Internal specifications reveal Microsoft scaled its MAI-Base-1 MoE model to one trillion parameters across multiple training runs

The final version uses 35 billion active parameters.

16392166131K

Original posts

#980

Comments

#420

Original post

Lisan al Gaib@scaling01#980inAI

Microsoft trained three DeepSeek-V3 sized models just for funsies and you are wondering if there's a compute gap between US and China

lmao

3:17 PM · Jun 2, 2026 · 30K Views

/AI5h ago

Internal specifications reveal Microsoft scaled its MAI-Base-1 MoE model to one trillion parameters across multiple training runs

The final version uses 35 billion active parameters.

--0--

Original posts

#980

Comments

#420

Original post

Lisan al Gaib@scaling01#980inAI

Microsoft trained three DeepSeek-V3 sized models just for funsies and you are wondering if there's a compute gap between US and China

lmao

3:17 PM · Jun 2, 2026 · 30K Views

Sentiment

Positive users highlight Microsoft's rare global scale, while negative users dismiss the MAI development efforts as failing to surpass DeepSeek and wasting resources.

Pos

25.0%

Neg

75.0%

4 comments with sentiment.

Cluster Engagement

Sentiment

Sentiment building, check back later.

Cluster Engagement

Views

Comments

Reposts

Bookmarks

Expand data

Posts from X

Most Activity

VIEWS2.8KLIKES61

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex

@scaling01 Tbh "for funsies" here means "basically restarted their entire research program to figure out the training of modern LLMs"

Lisan al Gaib@scaling01

Microsoft trained three DeepSeek-V3 sized models just for funsies and you are wondering if there's a compute gap between US and China

lmao

5h2.8K610