/Tech27d ago

Microsoft AI details Rocket, an in-house distributed reinforcement learning framework using SGLang to train its MAI-Thinking-1 model

The system supports training across thousands of GPUs.

--0--

#851

Original post

Ying Sheng#851

slime@slime_framework

Huge congrats to the Microsoft AI team on MAI-Thinking-1.

Great to see large-scale RL systems converging around the SGLang + Ray ecosystem. Rocket’s design—async RL, separated rollout / inference / learner pools, router-based traffic control, prefix caching, and fault-tolerant inference—is very aligned with what we believe in slime: RL is not just an algorithm problem, but a full-stack infrastructure problem.

Excited to see more open RL infra ideas validated at frontier scale!

LMSYS Org@lmsysorg

Huge milestone for the Microsoft AI team: seven frontier MAI models, led by MAI-Thinking-1. Proud that SGLang powered the RL inference stack behind it. Their Rocket framework runs SGLang and the SGLang router for load balancing, traffic control, prefix caching, and graceful failure recovery across thousands of inference chips.

Congrats to the team @MicrosoftAI 👏

Read more on how SGLang powers the stack: https://microsoft.ai/wp-content/uploads/2026/06/main_20260602_2.pdf

3:01 AM · Jun 3, 2026 · 7.5K Views

Sentiment

Users praised Microsoft's SGLang RL inference report for its strong efficiency metrics like higher throughput per watt and unusually detailed transparency on the training run.

Pos

100.0%

Neg

0.0%

5 comments with sentiment.

Cluster Engagement

Views

Comments

Reposts

Bookmarks

Expand data

Digg Deeper

No Digg Deeper questions have been answered for this story yet.

Related links

main 20260602 2.pdf

MICROSOFT.AIVia

Posts from X

Most Activity

VIEWS2.2KBOOKMARKS7REPLIES4

elie@eliebakouch

and if you're done with this thread and still want to read more about this report, pease take a look at the goat @stochasticchasm recap

27d2.2K187

LIKES33

elie@eliebakouch

this was an insanely good read, i think this is the most detailed report i've read at this scale in some aspects. i really hope MAI continues releasing those tech reports, thanks a lot to the team for this gift 🥹 https://microsoft.ai/wp-content/uploads/2026/06/main_20260602_2.pdf#page=81.11

27d1.3K334

RETWEETS4

elie@eliebakouch

microsoft uses SGlang wow

27d3.4K517

elie@eliebakouch

will conclude by this, 40% higher throughput per Watt (or is it different from "rack power budget"?) is pretty impressive and bullish on microsoft chips

27d927121

elie@eliebakouch

good infra numbers about the final training run, love the transparency here

27d84112

elie@eliebakouch

@stochasticchasm and the one from @nrehiew_ 🐐

27d83341

Deepak Vijaykeerthy@deepakvijayke

My takeaway from the RL part (something perhaps everyone agrees) is that the initial iterations of training a reasoning model from scratch are overwhelmingly a stability‑engineering problem. Now they have the infrastructure in place, the key was also getting the data/tasks modelling (many times I see folks directly exploring how to improve the algorithms, thinking through the structure of the tasks). Now, it should give them a lot of freedom to explore in the algorithmic space.

27d842

Harold Benoit@harold_matmul

@eliebakouch Very happy that you liked it :D

27d411

Edoardo Maggio@northead

@eliebakouch @stochasticchasm @threadreaderapp please unroll!

27d7

Deepak Vijaykeerthy@deepakvijayke

@eliebakouch @stochasticchasm Another interesting fact is that they start their RL from a checkpoint that hasn't been exposed to reasoning trajectories!

27d152

Suresh@_Suresh2

@slime_framework router rebalance lag idled 20% of gpus when rollout waited on inference

27d90

Prithvi Jadwani | AI SEO | GEO | REDDIT SEO | GMB@Prithvi_Jadwani

@slime_framework Glad to see Rocket's async design gaining traction. The real challenge is integrating this at scale within existing workflows.

26d19

Khoa@kwafam7

@eliebakouch @TheZachMueller many such cases

27d19

ThomAub@ThomAub

@eliebakouch @TheZachMueller What else?

27d10

Thread Reader App@threadreaderapp

@northead @eliebakouch @stochasticchasm @northead Hi, please find the unroll here: https://threadreaderapp.com/thread/2061965825037254947.html Talk to you soon. 🤖

27d4

Vishaal Udandarao@vishaal_urao

@harold_matmul @eliebakouch Awesome work, enjoyed going through the report! Had qq about agentic evals: For terminalbench, seems you didn't use Harbor/Terminus-agent rather used a ReAct-loop with tool dispatches. This might make numbers less comparable, was there a reason to not use Terminus-agent directly?

27d4