/AI5h ago

Internal specifications reveal Microsoft scaled its MAI-Base-1 MoE model to one trillion parameters across multiple training runs

The final version uses 35 billion active parameters.

--0--
Original posts
Comments
Original post
Lisan al Gaib@scaling01#980inAI

Microsoft trained three DeepSeek-V3 sized models just for funsies and you are wondering if there's a compute gap between US and China

lmao

3:17 PM · Jun 2, 2026 · 30K Views
Sentiment
Sentiment building, check back later.
Cluster Engagement
-
Views
-
Comments
-
Reposts
-
Bookmarks
Expand data
Posts from X
Most Activity
Most ActivityTimeline
VIEWS2.8KLIKES61

@scaling01 Tbh "for funsies" here means "basically restarted their entire research program to figure out the training of modern LLMs"

Lisan al Gaib@scaling01

Microsoft trained three DeepSeek-V3 sized models just for funsies and you are wondering if there's a compute gap between US and China

lmao

5hViews 2.8KLikes 61Bookmarks 0