/Tech1h ago

Essential AI's Aleksa Gordić reveals MAI-Thinking-1 training details, including tied embedding weights and disabled NVLink SHARP

Naman Goyal linked the determinism issue to an NCCL bug.

11404731

#1845

Original post

Aleksa Gordić (水平问题)@gordic_aleksa#1845inTech

few surprising details about MAI-Thinking-1: 1. AdamW 2. input and output embedding weights are tied - from ablations i've seen so far this only made sense for ~4B and below 3. NVLink SHARP disabled to ensure determinism but at the cost of reduced performance

9:16 AM · Jun 21, 2026 · 684 Views

Sentiment

Sentiment building, check back later.

Cluster Engagement

Digg Deeper

No Digg Deeper questions have been answered for this story yet.

GITHUBVia

#1916

Posts from X

Most Activity

Naman Goyal@NamanGoyal21

@gordic_aleksa Though on 3 I think they might be unaware of https://github.com/NVIDIA/nccl/issues/1497#issuecomment-3210819243

Aleksa Gordić (水平问题)@gordic_aleksa

1h4700