few surprising details about MAI-Thinking-1: 1. AdamW 2. input and output embedding weights are tied - from ablations i've seen so far this only made sense for ~4B and below 3. NVLink SHARP disabled to ensure determinism but at the cost of reduced performance
Essential AI's Aleksa Gordić reveals MAI-Thinking-1 training details, including tied embedding weights and disabled NVLink SHARP
Naman Goyal linked the determinism issue to an NCCL bug.
11404731
9:16 AM · Jun 21, 2026 · 684 Views
Sentiment
Sentiment building, check back later.
Cluster Engagement
Digg Deeper
No Digg Deeper questions have been answered for this story yet.
Related links
Posts from X
Most Activity
Most Activity
VIEWS47
Naman Goyal@NamanGoyal21
@gordic_aleksa Though on 3 I think they might be unaware of https://github.com/NVIDIA/nccl/issues/1497#issuecomment-3210819243
Aleksa Gordić (水平问题)@gordic_aleksa
few surprising details about MAI-Thinking-1: 1. AdamW 2. input and output embedding weights are tied - from ablations i've seen so far this only made sense for ~4B and below 3. NVLink SHARP disabled to ensure determinism but at the cost of reduced performance
1hViews 47Likes 0Bookmarks 0