BREAKING: Microsoft (specifically MAI, previously reflection) just announced their new LLM.
MAI-Thinking-1 and MAI-Code-1-Flash
MAI-Thinking is comparing itself to Sonnet, and showing interesting evals!
Not available to try yet sadly.
AI Judge changed title after evaluation, original title: "Microsoft announces MAI-Thinking-1, a 45B active-parameter MoE model scoring 97.0 on AIME 2025"
MAI-Code-1-Flash will integrate exclusively inside GitHub Copilot.
BREAKING: Microsoft (specifically MAI, previously reflection) just announced their new LLM.
MAI-Thinking-1 and MAI-Code-1-Flash
MAI-Thinking is comparing itself to Sonnet, and showing interesting evals!
Not available to try yet sadly.
Positive users praised Microsoft's MAI-Thinking-1 technical report for clarifying training methods and benchmarks while negative users attacked the company broadly or called the model unoriginal and inferior.

How can I access this model, via Cursor?
No, MAI-Code-1-Flash integrates exclusively into GitHub Copilot in VS Code via the model picker (and Auto mode). ¹
Cursor users can access Copilot features in some setups but not this specific Microsoft model directly.
Seven new models launching at Build: let’s go! Reasoning. Code. Image. Transcribe. Voice.
Built from scratch on a clean data lineage, designed for efficiency, working seamlessly as a family of models
Thread 🧵 #MSBuild
microsoft MAI tech report is a gold mine, one of the most transparent for a model at this scale.
this model uses zero synthetic data or distillation from previous models. this means reasoning, agentic behavior, tool use are all learned fully during post-training with no cold start. bold choice that makes it harder and requires more iterations to reach sota, but you get FULL control over your model series and it proves they are serious about being a frontier lab.
the tech report is insanely detailed and precise about numbers. to give an example, they give the exact MFU across all the iterations of the model, with the exact changes etc. they also share the full scaling ladder recipe, to my knowledge this is the first time i've seen this in a tech report at this scale
let's look at all of this in this likely very long thread 🧵
Super excited to announce seven new world-class MAI models today. They represent what we consider a new era in AI designed to keep you in control and on the frontier. First is our text foundation model, MAI-Thinking-1, exceptionally strong on reasoning and SWE tasks. - It’s a 35B active parameter MoE with a 256K context window. Independent human raters on Surge prefer it for overall quality in blind side-by-sides versus Sonnet 4.6, and it’s achieved 97% on AIME 2025, the key measure of its general-purpose reasoning abilities. - It's at 53% on SWE Bench Pro, placing it right alongside Opus 4.6 on one of the toughest coding benchmarks. - And since we co-designed our models with our own silicon, MAI-Thinking-1 is optimized on our MAIA 200 chip. Benchmarking head-to-head against the GB200, we see 30% better performance per dollar as well as a 1.4x performance-per-watt gain when running our MAI models on the MAIA 200 end-to-end.
Next is MAI-Image-2.5 and its Flash variant. Two super strong models now at #2 on the leaderboards, surpassing the score of Nano Banana 2 on image editing.
Last for now is MAI-Code-1-Flash, our new inference efficient coding model, especially tuned for VS Code and GitHub Copilot CLI. - Code-1-Flash achieves 51% on SWE Bench Pro, despite having just 5B parameters, putting it closer to Haiku in size but cheaper in cost.
All of this is the foundation for Microsoft Frontier Tuning. It lets you customize our models to create custom, company-specific agents that only you control. You can make our model, your model. Your data. Your agents. Your moat.
Early adopters are already seeing a difference. When we tuned our models for McKinsey’s tasks, MAI delivered the highest win rate, outperforming GPT-5.5 on quality, while being 10x lower on cost.
Also really excited to be collaborating with the amazing team at Mayo Clinic to jointly train a new frontier AI model for healthcare.
Our announcements today mark another milestone on the road to humanist superintelligence. You can learn more and about our other new models in our latest blog: https://microsoft.ai/news/building-a-hillclimbing-machine-launching-seven-new-mai-models/
WOW microsoft new "MAI Thinking 1" model comes with a 109 page tech report that looks REALLY detailed, this is amazing
everyone who doubted Mustafa real quiet right now
Super excited to announce seven new world-class MAI models today. They represent what we consider a new era in AI designed to keep you in control and on the frontier. First is our text foundation model, MAI-Thinking-1, exceptionally strong on reasoning and SWE tasks. - It’s a 35B active parameter MoE with a 256K context window. Independent human raters on Surge prefer it for overall quality in blind side-by-sides versus Sonnet 4.6, and it’s achieved 97% on AIME 2025, the key measure of its general-purpose reasoning abilities. - It's at 53% on SWE Bench Pro, placing it right alongside Opus 4.6 on one of the toughest coding benchmarks. - And since we co-designed our models with our own silicon, MAI-Thinking-1 is optimized on our MAIA 200 chip. Benchmarking head-to-head against the GB200, we see 30% better performance per dollar as well as a 1.4x performance-per-watt gain when running our MAI models on the MAIA 200 end-to-end.
Next is MAI-Image-2.5 and its Flash variant. Two super strong models now at #2 on the leaderboards, surpassing the score of Nano Banana 2 on image editing.
Last for now is MAI-Code-1-Flash, our new inference efficient coding model, especially tuned for VS Code and GitHub Copilot CLI. - Code-1-Flash achieves 51% on SWE Bench Pro, despite having just 5B parameters, putting it closer to Haiku in size but cheaper in cost.
All of this is the foundation for Microsoft Frontier Tuning. It lets you customize our models to create custom, company-specific agents that only you control. You can make our model, your model. Your data. Your agents. Your moat.
Early adopters are already seeing a difference. When we tuned our models for McKinsey’s tasks, MAI delivered the highest win rate, outperforming GPT-5.5 on quality, while being 10x lower on cost.
Also really excited to be collaborating with the amazing team at Mayo Clinic to jointly train a new frontier AI model for healthcare.
Our announcements today mark another milestone on the road to humanist superintelligence. You can learn more and about our other new models in our latest blog: https://microsoft.ai/news/building-a-hillclimbing-machine-launching-seven-new-mai-models/
okay live reading/reacting thread here, there's a ton to go through
quite a gold mine, so much info
Climbing with no distillation, like the Big Boys do, has been super fun! Read the tech report for a taste of our ̶s̶u̶f̶f̶e̶r̶i̶n̶g̶ journey
Super excited to announce seven new world-class MAI models today. They represent what we consider a new era in AI designed to keep you in control and on the frontier. First is our text foundation model, MAI-Thinking-1, exceptionally strong on reasoning and SWE tasks. - It’s a 35B active parameter MoE with a 256K context window. Independent human raters on Surge prefer it for overall quality in blind side-by-sides versus Sonnet 4.6, and it’s achieved 97% on AIME 2025, the key measure of its general-purpose reasoning abilities. - It's at 53% on SWE Bench Pro, placing it right alongside Opus 4.6 on one of the toughest coding benchmarks. - And since we co-designed our models with our own silicon, MAI-Thinking-1 is optimized on our MAIA 200 chip. Benchmarking head-to-head against the GB200, we see 30% better performance per dollar as well as a 1.4x performance-per-watt gain when running our MAI models on the MAIA 200 end-to-end.
Next is MAI-Image-2.5 and its Flash variant. Two super strong models now at #2 on the leaderboards, surpassing the score of Nano Banana 2 on image editing.
Last for now is MAI-Code-1-Flash, our new inference efficient coding model, especially tuned for VS Code and GitHub Copilot CLI. - Code-1-Flash achieves 51% on SWE Bench Pro, despite having just 5B parameters, putting it closer to Haiku in size but cheaper in cost.
All of this is the foundation for Microsoft Frontier Tuning. It lets you customize our models to create custom, company-specific agents that only you control. You can make our model, your model. Your data. Your agents. Your moat.
Early adopters are already seeing a difference. When we tuned our models for McKinsey’s tasks, MAI delivered the highest win rate, outperforming GPT-5.5 on quality, while being 10x lower on cost.
Also really excited to be collaborating with the amazing team at Mayo Clinic to jointly train a new frontier AI model for healthcare.
Our announcements today mark another milestone on the road to humanist superintelligence. You can learn more and about our other new models in our latest blog: https://microsoft.ai/news/building-a-hillclimbing-machine-launching-seven-new-mai-models/
MAI-Image-2.5 has officially released from @MicrosoftAI landing at #2 in the Image Edit Arena (Single-Image-Edit) with a score of 1401 and advances the Pareto frontier!
This puts the model +10 pts over Nano Banana 2, Grok Imagine Image Quality and ChatGPT-Image-Latest-High Fidelity.
Congrats to the @MicrosoftAI team on this big accomplishment!
Mai-1 thinking: Mid size model, 45b active parameter, MoE, side by side with sonnet 4.6
0 distillation
„Microsoft’s first reasoning model“
Mustafa Suleyman, Microsoft AI: 7 new Microsoft Models, no end in sight when it comes to development, orders of magnitude in the next few years
happy to share another quality tech report w/ the wider research community 🫶
great read for ppl who want to see all the details for methods + infra for scaling up pretraining & RL, esp detailed discussion about data which is often kept vague by other labs
Super excited to announce seven new world-class MAI models today. They represent what we consider a new era in AI designed to keep you in control and on the frontier. First is our text foundation model, MAI-Thinking-1, exceptionally strong on reasoning and SWE tasks. - It’s a 35B active parameter MoE with a 256K context window. Independent human raters on Surge prefer it for overall quality in blind side-by-sides versus Sonnet 4.6, and it’s achieved 97% on AIME 2025, the key measure of its general-purpose reasoning abilities. - It's at 53% on SWE Bench Pro, placing it right alongside Opus 4.6 on one of the toughest coding benchmarks. - And since we co-designed our models with our own silicon, MAI-Thinking-1 is optimized on our MAIA 200 chip. Benchmarking head-to-head against the GB200, we see 30% better performance per dollar as well as a 1.4x performance-per-watt gain when running our MAI models on the MAIA 200 end-to-end.
Next is MAI-Image-2.5 and its Flash variant. Two super strong models now at #2 on the leaderboards, surpassing the score of Nano Banana 2 on image editing.
Last for now is MAI-Code-1-Flash, our new inference efficient coding model, especially tuned for VS Code and GitHub Copilot CLI. - Code-1-Flash achieves 51% on SWE Bench Pro, despite having just 5B parameters, putting it closer to Haiku in size but cheaper in cost.
All of this is the foundation for Microsoft Frontier Tuning. It lets you customize our models to create custom, company-specific agents that only you control. You can make our model, your model. Your data. Your agents. Your moat.
Early adopters are already seeing a difference. When we tuned our models for McKinsey’s tasks, MAI delivered the highest win rate, outperforming GPT-5.5 on quality, while being 10x lower on cost.
Also really excited to be collaborating with the amazing team at Mayo Clinic to jointly train a new frontier AI model for healthcare.
Our announcements today mark another milestone on the road to humanist superintelligence. You can learn more and about our other new models in our latest blog: https://microsoft.ai/news/building-a-hillclimbing-machine-launching-seven-new-mai-models/
Finally, Microsoft enters the AI arena!
They just announced two foundation models - MAI and Mai-Flash
They used to have a great research lab... and early benchmarks look good 🚀🚀
Microsoft also released MAI-Code-1-Flash
It's a 137B parameter MoE coding model with a context length of 256K tokens trained on over 10T tokens.
At launch it will be available only in GitHub Copilot in Visual Studio Code.
It seems to be stronger and more efficient than Claude 4.5 Haiku.
Microsoft introduces MAI-Thinking-1
It's a 1T@35B parameter model pre-trained on 30T tokens with a maximum context length of 256k tokens using 8192 GB200 GPUs.
Based on benchmarks it seems to be around GLM-5 level.
Microsoft also released a comprehensive 109 pages tech-report: https://microsoft.ai/wp-content/uploads/2026/06/main_20260602_2.pdf
Super detailed tech report for MAI-Thinking-1, with a ton of info on all stages of the pipeline. I'm surprised so much of this info is released :)
Super long thread on my notes:
Super excited to announce seven new world-class MAI models today. They represent what we consider a new era in AI designed to keep you in control and on the frontier. First is our text foundation model, MAI-Thinking-1, exceptionally strong on reasoning and SWE tasks. - It’s a 35B active parameter MoE with a 256K context window. Independent human raters on Surge prefer it for overall quality in blind side-by-sides versus Sonnet 4.6, and it’s achieved 97% on AIME 2025, the key measure of its general-purpose reasoning abilities. - It's at 53% on SWE Bench Pro, placing it right alongside Opus 4.6 on one of the toughest coding benchmarks. - And since we co-designed our models with our own silicon, MAI-Thinking-1 is optimized on our MAIA 200 chip. Benchmarking head-to-head against the GB200, we see 30% better performance per dollar as well as a 1.4x performance-per-watt gain when running our MAI models on the MAIA 200 end-to-end.
Next is MAI-Image-2.5 and its Flash variant. Two super strong models now at #2 on the leaderboards, surpassing the score of Nano Banana 2 on image editing.
Last for now is MAI-Code-1-Flash, our new inference efficient coding model, especially tuned for VS Code and GitHub Copilot CLI. - Code-1-Flash achieves 51% on SWE Bench Pro, despite having just 5B parameters, putting it closer to Haiku in size but cheaper in cost.
All of this is the foundation for Microsoft Frontier Tuning. It lets you customize our models to create custom, company-specific agents that only you control. You can make our model, your model. Your data. Your agents. Your moat.
Early adopters are already seeing a difference. When we tuned our models for McKinsey’s tasks, MAI delivered the highest win rate, outperforming GPT-5.5 on quality, while being 10x lower on cost.
Also really excited to be collaborating with the amazing team at Mayo Clinic to jointly train a new frontier AI model for healthcare.
Our announcements today mark another milestone on the road to humanist superintelligence. You can learn more and about our other new models in our latest blog: https://microsoft.ai/news/building-a-hillclimbing-machine-launching-seven-new-mai-models/
Three new @MicrosoftAI models now live on OpenRouter!
Launching together: MAI-Image-2.5, MAI-Transcribe-1.5, and MAI-Voice-2. More on each below 🧵
some pretty crazy RL graphs, and very interesting decision here to start RL from a checkpoint with no reasoning exposure
WOW microsoft new "MAI Thinking 1" model comes with a 109 page tech report that looks REALLY detailed, this is amazing
dspy.GEPA used in pretraining data curation in the new Microsoft AI effort :-)
Excited to see the use of GEPA-optimized LLM judges for data filtering in MAI-Thinking-1 model's pre-training pipeline!
Today we announced MAI-Thinking-1, a strong generalist and reasoning LLM built from the ground up without distilling third-party models. 97% on AIME 2025; 53% on SWE-Bench Pro; preferred by human raters over Sonnet 4.6 (blind side-by-side).
Tech report: https://microsoft.ai/wp-content/uploads/2026/06/main_20260602_2.pdf
the best cookbook about how to train a sota llm right now.
WOW microsoft new "MAI Thinking 1" model comes with a 109 page tech report that looks REALLY detailed, this is amazing
love how MAI-Thinking-1 report views synthetic environments not as evaluation infrastructure but as capability infrastructure.
my current belief is that a lot of progress in agentic systems over the next few years will come less from larger models and more from richer synthetic worlds. reward hacking, deceptive behavior, tool misuse, long-horizon planning failures...these are fundamentally environment-level phenomena. they don't reliably emerge in static benchmarks.
in some sense the strongest labs may end up being the ones with the best world generators rather than the biggest models. imo synthetic environments feel increasingly analogous to self-play in AlphaGo...the real breakthrough is creating an engine that can continuously generate novel experiences with reliable feedback. if that intuition is correct environment diversity may eventually become its own scaling-law variable alongside parameters, tokens and compute.
MAI-Thinking-1 is out!
Excited to share what we are building and how climbing from scratch (no distillation) actually works: simple recipes, rigorous science, self-distillation, patience, and great infra.
Check out our tech report has the full story of our RL climbs. https://microsoft.ai/wp-content/uploads/2026/06/main_20260602_2.pdf
Excited to see the use of GEPA-optimized LLM judges for data filtering in MAI-Thinking-1 model's pre-training pipeline!
Super excited to announce seven new world-class MAI models today. They represent what we consider a new era in AI designed to keep you in control and on the frontier. First is our text foundation model, MAI-Thinking-1, exceptionally strong on reasoning and SWE tasks. - It’s a 35B active parameter MoE with a 256K context window. Independent human raters on Surge prefer it for overall quality in blind side-by-sides versus Sonnet 4.6, and it’s achieved 97% on AIME 2025, the key measure of its general-purpose reasoning abilities. - It's at 53% on SWE Bench Pro, placing it right alongside Opus 4.6 on one of the toughest coding benchmarks. - And since we co-designed our models with our own silicon, MAI-Thinking-1 is optimized on our MAIA 200 chip. Benchmarking head-to-head against the GB200, we see 30% better performance per dollar as well as a 1.4x performance-per-watt gain when running our MAI models on the MAIA 200 end-to-end.
Next is MAI-Image-2.5 and its Flash variant. Two super strong models now at #2 on the leaderboards, surpassing the score of Nano Banana 2 on image editing.
Last for now is MAI-Code-1-Flash, our new inference efficient coding model, especially tuned for VS Code and GitHub Copilot CLI. - Code-1-Flash achieves 51% on SWE Bench Pro, despite having just 5B parameters, putting it closer to Haiku in size but cheaper in cost.
All of this is the foundation for Microsoft Frontier Tuning. It lets you customize our models to create custom, company-specific agents that only you control. You can make our model, your model. Your data. Your agents. Your moat.
Early adopters are already seeing a difference. When we tuned our models for McKinsey’s tasks, MAI delivered the highest win rate, outperforming GPT-5.5 on quality, while being 10x lower on cost.
Also really excited to be collaborating with the amazing team at Mayo Clinic to jointly train a new frontier AI model for healthcare.
Our announcements today mark another milestone on the road to humanist superintelligence. You can learn more and about our other new models in our latest blog: https://microsoft.ai/news/building-a-hillclimbing-machine-launching-seven-new-mai-models/
what an absolutely incredible tech report, made a ~50 tweet recap (?) of my notes in this thread
microsoft MAI tech report is a gold mine, one of the most transparent for a model at this scale.
this model uses zero synthetic data or distillation from previous models. this means reasoning, agentic behavior, tool use are all learned fully during post-training with no cold start. bold choice that makes it harder and requires more iterations to reach sota, but you get FULL control over your model series and it proves they are serious about being a frontier lab.
the tech report is insanely detailed and precise about numbers. to give an example, they give the exact MFU across all the iterations of the model, with the exact changes etc. they also share the full scaling ladder recipe, to my knowledge this is the first time i've seen this in a tech report at this scale
let's look at all of this in this likely very long thread 🧵