Microsoft AI announces MAI-Thinking-1 and MAI-Code-1-Flash, claiming they outperform Claude on coding and reasoning benchmarks · Digg

Posts from X

Most Activity

VIEWS306.1KLIKES3.1KRETWEETS481REPLIES122

Microsoft AI@MicrosoftAI

Seven new models launching at Build: let’s go! Reasoning. Code. Image. Transcribe. Voice.

Built from scratch on a clean data lineage, designed for efficiency, working seamlessly as a family of models

Thread 🧵 #MSBuild

27d306.1K3.1K1K

BOOKMARKS1.6K

elie@eliebakouch

microsoft MAI tech report is a gold mine, one of the most transparent for a model at this scale.

this model uses zero synthetic data or distillation from previous models. this means reasoning, agentic behavior, tool use are all learned fully during post-training with no cold start. bold choice that makes it harder and requires more iterations to reach sota, but you get FULL control over your model series and it proves they are serious about being a frontier lab.

the tech report is insanely detailed and precise about numbers. to give an example, they give the exact MFU across all the iterations of the model, with the exact changes etc. they also share the full scaling ladder recipe, to my knowledge this is the first time i've seen this in a tech report at this scale

let's look at all of this in this likely very long thread 🧵

Mustafa Suleyman@mustafasuleyman

Super excited to announce seven new world-class MAI models today. They represent what we consider a new era in AI designed to keep you in control and on the frontier. First is our text foundation model, MAI-Thinking-1, exceptionally strong on reasoning and SWE tasks. - It’s a 35B active parameter MoE with a 256K context window. Independent human raters on Surge prefer it for overall quality in blind side-by-sides versus Sonnet 4.6, and it’s achieved 97% on AIME 2025, the key measure of its general-purpose reasoning abilities. - It's at 53% on SWE Bench Pro, placing it right alongside Opus 4.6 on one of the toughest coding benchmarks. - And since we co-designed our models with our own silicon, MAI-Thinking-1 is optimized on our MAIA 200 chip. Benchmarking head-to-head against the GB200, we see 30% better performance per dollar as well as a 1.4x performance-per-watt gain when running our MAI models on the MAIA 200 end-to-end.

Next is MAI-Image-2.5 and its Flash variant. Two super strong models now at #2 on the leaderboards, surpassing the score of Nano Banana 2 on image editing.

Last for now is MAI-Code-1-Flash, our new inference efficient coding model, especially tuned for VS Code and GitHub Copilot CLI. - Code-1-Flash achieves 51% on SWE Bench Pro, despite having just 5B parameters, putting it closer to Haiku in size but cheaper in cost.

All of this is the foundation for Microsoft Frontier Tuning. It lets you customize our models to create custom, company-specific agents that only you control. You can make our model, your model. Your data. Your agents. Your moat.

Early adopters are already seeing a difference. When we tuned our models for McKinsey’s tasks, MAI delivered the highest win rate, outperforming GPT-5.5 on quality, while being 10x lower on cost.

Also really excited to be collaborating with the amazing team at Mayo Clinic to jointly train a new frontier AI model for healthcare.

Our announcements today mark another milestone on the road to humanist superintelligence. You can learn more and about our other new models in our latest blog: https://microsoft.ai/news/building-a-hillclimbing-machine-launching-seven-new-mai-models/

27d258K2K1.6K

elie@eliebakouch

WOW microsoft new "MAI Thinking 1" model comes with a 109 page tech report that looks REALLY detailed, this is amazing

27d170K901634

kache@yacineMTB

everyone who doubted Mustafa real quiet right now

Mustafa Suleyman@mustafasuleyman

Super excited to announce seven new world-class MAI models today. They represent what we consider a new era in AI designed to keep you in control and on the frontier. First is our text foundation model, MAI-Thinking-1, exceptionally strong on reasoning and SWE tasks. - It’s a 35B active parameter MoE with a 256K context window. Independent human raters on Surge prefer it for overall quality in blind side-by-sides versus Sonnet 4.6, and it’s achieved 97% on AIME 2025, the key measure of its general-purpose reasoning abilities. - It's at 53% on SWE Bench Pro, placing it right alongside Opus 4.6 on one of the toughest coding benchmarks. - And since we co-designed our models with our own silicon, MAI-Thinking-1 is optimized on our MAIA 200 chip. Benchmarking head-to-head against the GB200, we see 30% better performance per dollar as well as a 1.4x performance-per-watt gain when running our MAI models on the MAIA 200 end-to-end.

Next is MAI-Image-2.5 and its Flash variant. Two super strong models now at #2 on the leaderboards, surpassing the score of Nano Banana 2 on image editing.

Last for now is MAI-Code-1-Flash, our new inference efficient coding model, especially tuned for VS Code and GitHub Copilot CLI. - Code-1-Flash achieves 51% on SWE Bench Pro, despite having just 5B parameters, putting it closer to Haiku in size but cheaper in cost.

All of this is the foundation for Microsoft Frontier Tuning. It lets you customize our models to create custom, company-specific agents that only you control. You can make our model, your model. Your data. Your agents. Your moat.

Early adopters are already seeing a difference. When we tuned our models for McKinsey’s tasks, MAI delivered the highest win rate, outperforming GPT-5.5 on quality, while being 10x lower on cost.

Also really excited to be collaborating with the amazing team at Mayo Clinic to jointly train a new frontier AI model for healthcare.

Our announcements today mark another milestone on the road to humanist superintelligence. You can learn more and about our other new models in our latest blog: https://microsoft.ai/news/building-a-hillclimbing-machine-launching-seven-new-mai-models/

27d222.9K1.2K396

stochasm@stochasticchasm

okay live reading/reacting thread here, there's a ton to go through

stochasm@stochasticchasm

quite a gold mine, so much info

27d68.3K510435

Luca Soldaini 🎀@soldni

Climbing with no distillation, like the Big Boys do, has been super fun! Read the tech report for a taste of our ̶s̶u̶f̶f̶e̶r̶i̶n̶g̶ journey

Mustafa Suleyman@mustafasuleyman

Super excited to announce seven new world-class MAI models today. They represent what we consider a new era in AI designed to keep you in control and on the frontier. First is our text foundation model, MAI-Thinking-1, exceptionally strong on reasoning and SWE tasks. - It’s a 35B active parameter MoE with a 256K context window. Independent human raters on Surge prefer it for overall quality in blind side-by-sides versus Sonnet 4.6, and it’s achieved 97% on AIME 2025, the key measure of its general-purpose reasoning abilities. - It's at 53% on SWE Bench Pro, placing it right alongside Opus 4.6 on one of the toughest coding benchmarks. - And since we co-designed our models with our own silicon, MAI-Thinking-1 is optimized on our MAIA 200 chip. Benchmarking head-to-head against the GB200, we see 30% better performance per dollar as well as a 1.4x performance-per-watt gain when running our MAI models on the MAIA 200 end-to-end.

Next is MAI-Image-2.5 and its Flash variant. Two super strong models now at #2 on the leaderboards, surpassing the score of Nano Banana 2 on image editing.

Last for now is MAI-Code-1-Flash, our new inference efficient coding model, especially tuned for VS Code and GitHub Copilot CLI. - Code-1-Flash achieves 51% on SWE Bench Pro, despite having just 5B parameters, putting it closer to Haiku in size but cheaper in cost.

All of this is the foundation for Microsoft Frontier Tuning. It lets you customize our models to create custom, company-specific agents that only you control. You can make our model, your model. Your data. Your agents. Your moat.

Early adopters are already seeing a difference. When we tuned our models for McKinsey’s tasks, MAI delivered the highest win rate, outperforming GPT-5.5 on quality, while being 10x lower on cost.

Also really excited to be collaborating with the amazing team at Mayo Clinic to jointly train a new frontier AI model for healthcare.

Our announcements today mark another milestone on the road to humanist superintelligence. You can learn more and about our other new models in our latest blog: https://microsoft.ai/news/building-a-hillclimbing-machine-launching-seven-new-mai-models/

27d62K677362

MAI-Image-2.5 has officially released from @MicrosoftAI landing at #2 in the Image Edit Arena (Single-Image-Edit) with a score of 1401 and advances the Pareto frontier!

This puts the model +10 pts over Nano Banana 2, Grok Imagine Image Quality and ChatGPT-Image-Latest-High Fidelity.

Congrats to the @MicrosoftAI team on this big accomplishment!

27d139.1K972198

Chubby♨️@kimmonismus

Mai-1 thinking: Mid size model, 45b active parameter, MoE, side by side with sonnet 4.6

0 distillation

„Microsoft’s first reasoning model“

Chubby♨️@kimmonismus

Mustafa Suleyman, Microsoft AI: 7 new Microsoft Models, no end in sight when it comes to development, orders of magnitude in the next few years

27d108.1K843161

Kyle Lo@kylelostat

happy to share another quality tech report w/ the wider research community 🫶

great read for ppl who want to see all the details for methods + infra for scaling up pretraining & RL, esp detailed discussion about data which is often kept vague by other labs

Mustafa Suleyman@mustafasuleyman

Super excited to announce seven new world-class MAI models today. They represent what we consider a new era in AI designed to keep you in control and on the frontier. First is our text foundation model, MAI-Thinking-1, exceptionally strong on reasoning and SWE tasks. - It’s a 35B active parameter MoE with a 256K context window. Independent human raters on Surge prefer it for overall quality in blind side-by-sides versus Sonnet 4.6, and it’s achieved 97% on AIME 2025, the key measure of its general-purpose reasoning abilities. - It's at 53% on SWE Bench Pro, placing it right alongside Opus 4.6 on one of the toughest coding benchmarks. - And since we co-designed our models with our own silicon, MAI-Thinking-1 is optimized on our MAIA 200 chip. Benchmarking head-to-head against the GB200, we see 30% better performance per dollar as well as a 1.4x performance-per-watt gain when running our MAI models on the MAIA 200 end-to-end.

Next is MAI-Image-2.5 and its Flash variant. Two super strong models now at #2 on the leaderboards, surpassing the score of Nano Banana 2 on image editing.

Last for now is MAI-Code-1-Flash, our new inference efficient coding model, especially tuned for VS Code and GitHub Copilot CLI. - Code-1-Flash achieves 51% on SWE Bench Pro, despite having just 5B parameters, putting it closer to Haiku in size but cheaper in cost.

All of this is the foundation for Microsoft Frontier Tuning. It lets you customize our models to create custom, company-specific agents that only you control. You can make our model, your model. Your data. Your agents. Your moat.

Early adopters are already seeing a difference. When we tuned our models for McKinsey’s tasks, MAI delivered the highest win rate, outperforming GPT-5.5 on quality, while being 10x lower on cost.

Also really excited to be collaborating with the amazing team at Mayo Clinic to jointly train a new frontier AI model for healthcare.

Our announcements today mark another milestone on the road to humanist superintelligence. You can learn more and about our other new models in our latest blog: https://microsoft.ai/news/building-a-hillclimbing-machine-launching-seven-new-mai-models/

27d22.7K347145

Bindu Reddy@bindureddy

Finally, Microsoft enters the AI arena!

They just announced two foundation models - MAI and Mai-Flash

They used to have a great research lab... and early benchmarks look good 🚀🚀

27d26.7K50561

Lisan al Gaib@scaling01

Microsoft also released MAI-Code-1-Flash

It's a 137B parameter MoE coding model with a context length of 256K tokens trained on over 10T tokens.

At launch it will be available only in GitHub Copilot in Visual Studio Code.

It seems to be stronger and more efficient than Claude 4.5 Haiku.

Lisan al Gaib@scaling01

Microsoft introduces MAI-Thinking-1

It's a 1T@35B parameter model pre-trained on 30T tokens with a maximum context length of 256k tokens using 8192 GB200 GPUs.

Based on benchmarks it seems to be around GLM-5 level.

Microsoft also released a comprehensive 109 pages tech-report: https://microsoft.ai/wp-content/uploads/2026/06/main_20260602_2.pdf

27d42K39279

wh@nrehiew_

Super detailed tech report for MAI-Thinking-1, with a ton of info on all stages of the pipeline. I'm surprised so much of this info is released :)

Super long thread on my notes:

Mustafa Suleyman@mustafasuleyman

Super excited to announce seven new world-class MAI models today. They represent what we consider a new era in AI designed to keep you in control and on the frontier. First is our text foundation model, MAI-Thinking-1, exceptionally strong on reasoning and SWE tasks. - It’s a 35B active parameter MoE with a 256K context window. Independent human raters on Surge prefer it for overall quality in blind side-by-sides versus Sonnet 4.6, and it’s achieved 97% on AIME 2025, the key measure of its general-purpose reasoning abilities. - It's at 53% on SWE Bench Pro, placing it right alongside Opus 4.6 on one of the toughest coding benchmarks. - And since we co-designed our models with our own silicon, MAI-Thinking-1 is optimized on our MAIA 200 chip. Benchmarking head-to-head against the GB200, we see 30% better performance per dollar as well as a 1.4x performance-per-watt gain when running our MAI models on the MAIA 200 end-to-end.

Next is MAI-Image-2.5 and its Flash variant. Two super strong models now at #2 on the leaderboards, surpassing the score of Nano Banana 2 on image editing.

Last for now is MAI-Code-1-Flash, our new inference efficient coding model, especially tuned for VS Code and GitHub Copilot CLI. - Code-1-Flash achieves 51% on SWE Bench Pro, despite having just 5B parameters, putting it closer to Haiku in size but cheaper in cost.

All of this is the foundation for Microsoft Frontier Tuning. It lets you customize our models to create custom, company-specific agents that only you control. You can make our model, your model. Your data. Your agents. Your moat.

Early adopters are already seeing a difference. When we tuned our models for McKinsey’s tasks, MAI delivered the highest win rate, outperforming GPT-5.5 on quality, while being 10x lower on cost.

Also really excited to be collaborating with the amazing team at Mayo Clinic to jointly train a new frontier AI model for healthcare.

Our announcements today mark another milestone on the road to humanist superintelligence. You can learn more and about our other new models in our latest blog: https://microsoft.ai/news/building-a-hillclimbing-machine-launching-seven-new-mai-models/

27d19.2K149112

OpenRouter@OpenRouter

Three new @MicrosoftAI models now live on OpenRouter!

Launching together: MAI-Image-2.5, MAI-Transcribe-1.5, and MAI-Voice-2. More on each below 🧵

27d44.9K33673

stochasm@stochasticchasm

some pretty crazy RL graphs, and very interesting decision here to start RL from a checkpoint with no reasoning exposure

elie@eliebakouch

WOW microsoft new "MAI Thinking 1" model comes with a 109 page tech report that looks REALLY detailed, this is amazing

27d16.6K22796

Omar Khattab@lateinteraction

dspy.GEPA used in pretraining data curation in the new Microsoft AI effort :-)

Lakshya A Agrawal@LakshyAAAgrawal

Excited to see the use of GEPA-optimized LLM judges for data filtering in MAI-Thinking-1 model's pre-training pipeline!

27d16.8K23580

Adam Sadovsky@asadovsky

Today we announced MAI-Thinking-1, a strong generalist and reasoning LLM built from the ground up without distilling third-party models. 97% on AIME 2025; 53% on SWE-Bench Pro; preferred by human raters over Sonnet 4.6 (blind side-by-side).

Tech report: https://microsoft.ai/wp-content/uploads/2026/06/main_20260602_2.pdf

27d19.2K24667

Asuka Zheng🎀@VoidAsuka

the best cookbook about how to train a sota llm right now.

elie@eliebakouch

WOW microsoft new "MAI Thinking 1" model comes with a 109 page tech report that looks REALLY detailed, this is amazing

27d16.1K13684

λux@novasarc01

love how MAI-Thinking-1 report views synthetic environments not as evaluation infrastructure but as capability infrastructure.

my current belief is that a lot of progress in agentic systems over the next few years will come less from larger models and more from richer synthetic worlds. reward hacking, deceptive behavior, tool misuse, long-horizon planning failures...these are fundamentally environment-level phenomena. they don't reliably emerge in static benchmarks.

in some sense the strongest labs may end up being the ones with the best world generators rather than the biggest models. imo synthetic environments feel increasingly analogous to self-play in AlphaGo...the real breakthrough is creating an engine that can continuously generate novel experiences with reliable feedback. if that intuition is correct environment diversity may eventually become its own scaling-law variable alongside parameters, tokens and compute.

Hanna Hajishirzi@HannaHajishirzi

MAI-Thinking-1 is out!

Excited to share what we are building and how climbing from scratch (no distillation) actually works: simple recipes, rigorous science, self-distillation, patience, and great infra.

Check out our tech report has the full story of our RL climbs. https://microsoft.ai/wp-content/uploads/2026/06/main_20260602_2.pdf

26d15.1K13078

Lakshya A Agrawal@LakshyAAAgrawal

Excited to see the use of GEPA-optimized LLM judges for data filtering in MAI-Thinking-1 model's pre-training pipeline!

Mustafa Suleyman@mustafasuleyman

Super excited to announce seven new world-class MAI models today. They represent what we consider a new era in AI designed to keep you in control and on the frontier. First is our text foundation model, MAI-Thinking-1, exceptionally strong on reasoning and SWE tasks. - It’s a 35B active parameter MoE with a 256K context window. Independent human raters on Surge prefer it for overall quality in blind side-by-sides versus Sonnet 4.6, and it’s achieved 97% on AIME 2025, the key measure of its general-purpose reasoning abilities. - It's at 53% on SWE Bench Pro, placing it right alongside Opus 4.6 on one of the toughest coding benchmarks. - And since we co-designed our models with our own silicon, MAI-Thinking-1 is optimized on our MAIA 200 chip. Benchmarking head-to-head against the GB200, we see 30% better performance per dollar as well as a 1.4x performance-per-watt gain when running our MAI models on the MAIA 200 end-to-end.

Next is MAI-Image-2.5 and its Flash variant. Two super strong models now at #2 on the leaderboards, surpassing the score of Nano Banana 2 on image editing.

Last for now is MAI-Code-1-Flash, our new inference efficient coding model, especially tuned for VS Code and GitHub Copilot CLI. - Code-1-Flash achieves 51% on SWE Bench Pro, despite having just 5B parameters, putting it closer to Haiku in size but cheaper in cost.

All of this is the foundation for Microsoft Frontier Tuning. It lets you customize our models to create custom, company-specific agents that only you control. You can make our model, your model. Your data. Your agents. Your moat.

Early adopters are already seeing a difference. When we tuned our models for McKinsey’s tasks, MAI delivered the highest win rate, outperforming GPT-5.5 on quality, while being 10x lower on cost.

Also really excited to be collaborating with the amazing team at Mayo Clinic to jointly train a new frontier AI model for healthcare.

Our announcements today mark another milestone on the road to humanist superintelligence. You can learn more and about our other new models in our latest blog: https://microsoft.ai/news/building-a-hillclimbing-machine-launching-seven-new-mai-models/

27d45.5K14764

elie@eliebakouch

what an absolutely incredible tech report, made a ~50 tweet recap (?) of my notes in this thread

elie@eliebakouch

microsoft MAI tech report is a gold mine, one of the most transparent for a model at this scale.

this model uses zero synthetic data or distillation from previous models. this means reasoning, agentic behavior, tool use are all learned fully during post-training with no cold start. bold choice that makes it harder and requires more iterations to reach sota, but you get FULL control over your model series and it proves they are serious about being a frontier lab.

the tech report is insanely detailed and precise about numbers. to give an example, they give the exact MFU across all the iterations of the model, with the exact changes etc. they also share the full scaling ladder recipe, to my knowledge this is the first time i've seen this in a tech report at this scale

let's look at all of this in this likely very long thread 🧵

27d18.8K13568

Microsoft AI announces MAI-Thinking-1 and MAI-Code-1-Flash, claiming they outperform Claude on coding and reasoning benchmarks · Digg