/Tech15h ago

AI Labs Shift Focus From Model Size To Post-Transformer Architectures

97424467.6K

Original post

AI’s foundation model race is shifting from who has the biggest model to which architecture can outgrow the transformer.

Architecture is becoming the real fault line in AI.

Mapping the Foundation Model Landscape:

The AI market is usually mapped by who is winning. The more consequential question is which research bet wins.

This is a discussion of the foundation model market based on what each lab is building and what architecture it is betting on, rather than who raised the most money or had the loudest launch.

Organized around the divide that will define the next 2 years.

The 2 real axes are scope and architecture: scope asks whether a lab is building a general model or a domain model, while architecture asks whether it is still scaling transformers or moving into the Post-Transformer camp.

The transformer still dominates because it turned attention into a scalable machine for prediction, and that 2017 design remains the backbone of modern foundation models.

The pressure now comes from a simple weakness: attention gets expensive as context grows, while real products increasingly demand long memory, low latency, and continuous interaction.

That is why the most interesting labs are no longer just asking who can train the largest model.

They are asking whether intelligence needs a different operating rhythm.

🧵 1/8

3:14 PM · Jul 1, 2026 · 5.9K Views

Sentiment

Users praised the thread on AI labs shifting focus from model size to post-transformer architectures, calling it insightful and a gem worth appreciating.

Pos

100.0%

Neg

0.0%

3 comments with sentiment.

Cluster Engagement

Digg Deeper

No Digg Deeper questions have been answered for this story yet.

Posts from X

Most Activity

VIEWS1.1KBOOKMARKS4LIKES6

Rohan Paul@rohanpaul_ai

🧵 8/8 The strongest pattern in the map is that the top half is converging while the bottom half is spreading out.

The Frontier Model labs are becoming more similar because the transformer playbook is now widely shared: mixture-of-experts, long context, reasoning training, tool use, and multimodal input are becoming common.

The Post-Transformer side looks less tidy, but that is where the deeper architecture question is still open.

The gap that matters most on this map is @fchollet 's ARC-AGI, designed so that memorization doesn't help and only novel-task reasoning counts.

Top frontier models now reach the mid-80s on ARC-AGI-2. On the new interactive ARC-AGI-3, frontier models out of the box score below 1% while humans solve 100% of it. Even the best harness-built agents remain far from human efficiency.

Closing that gap is what every lab on this landscape is ultimately chasing.

1d1.1K64

RETWEETS4

Rohan Paul@rohanpaul_ai

🧵 8/8 The strongest pattern in the map is that the top half is converging while the bottom half is spreading out.

The Post-Transformer side looks less tidy, but that is where the deeper architecture question is still open.

The gap that matters most on this map is @fchollet 's ARC-AGI, designed so that memorization doesn't help and only novel-task reasoning counts.

Closing that gap is what every lab on this landscape is ultimately chasing.

1d1.7K97

REPLIES1

安叫兽|Bird🕊️ 🔶 BNB@ajs6888

@rohanpaul_ai 感觉下一轮差距可能不在参数量上了

9h171

Rohan Paul@rohanpaul_ai

🧵 3/8 Region 1 is the Frontier Model lab region, which means current-paradigm, general-purpose AI.

This includes OpenAI, Anthropic, Google DeepMind, xAI, Meta, Mistral, Cohere, Alibaba Qwen, DeepSeek, Moonshot, Thinking Machines, Reflection AI, Microsoft AI, MiniMax, Zhipu (now branded http://Z.ai), and ByteDance Seed.

A Frontier Model is a model near the top of the market on broad capability, so it can write, code, reason, use tools, follow long context, and handle many tasks instead of being narrow.

These labs are mostly still improving transformer-based systems, so the fight is now about data, reasoning quality, inference cost, latency, openness, safety work, product surface, and distribution.

What to watch. Whether the next gains come from reinforcement learning pushed beyond easily-verified domains and from continual learning that lets models keep learning during deployment, and whether the open-weight frontier, led by Chinese labs and entrants like Reflection, closes the gap to the closed one.

1d46842

Rohan Paul@rohanpaul_ai

🧵 5/8 Region 3 is the Post-Transformer challenger region, which means general-purpose labs trying to move beyond the standard transformer recipe.

This includes Pathway, Liquid AI, NXAI, AI21 Labs, NVIDIA Nemotron, Inception, AMI Labs, World Labs, General Intuition, Sakana AI, and Safe Superintelligence.

Post-Transformer does not mean every lab has thrown away attention completely.

It means the lab is testing whether state-space models, recurrence, diffusion language models, world models, hybrid attention, or new undisclosed designs can handle memory, speed, long context, and continual learning better than today’s transformer-heavy systems.

What to watch. The first credible demonstration that a non-transformer or hybrid architecture matches a frontier transformer at frontier scale. That single result would convert this region from a thesis into a movement most labs would race towards.

1d19851

Rohan Paul@rohanpaul_ai

🧵 7/8 The map also explains why Neolab has become a useful word for this market.

Building on @deedydas 's 'neolabs' framing, extended here in pattern #2.

A Neolab is a newer research-first AI lab, often started by people from OpenAI, Google Brain, DeepMind, Anthropic, Meta, or xAI, and built around a scientific bet rather than a normal software product.

Reflection AI, Thinking Machines, Safe Superintelligence, AMI Labs, Pathway, Sakana AI, and similar labs fit this pattern.

Some Neolab companies are chasing open-weight Frontier Model systems, while others are trying to prove Post-Transformer designs that could change the whole Foundation Model layer.

1d89851

Rohan Paul@rohanpaul_ai

🧵 2/8 A Foundation Model maker is a company where the model itself is the core product, and the model’s performance is what gives the company its value.

That is why this map leaves out application companies like Cursor, Perplexity, Harvey, Glean, and Sierra, because they mostly build product layers, workflows, and distribution around models made by other labs.

If a company can swap the model underneath and the product still feels mostly the same, then it is probably an application company, not a Foundation Model maker.

1d48551

Rohan Paul@rohanpaul_ai

🧵 6/8 Region 4 is the applied Post-Transformer region, where the new architecture bet is already tied to a specific product surface.

Cartesia uses state-space style models for real-time voice, Decart and Odyssey work on interactive world models, and Waabi and Wayve apply world-model ideas to autonomous driving.

These domains care a lot about low latency, continuous generation, and fast reaction.

A transformer can be powerful, but its attention step gets expensive as context grows, because the model keeps comparing many pieces of information against many other pieces.

That is why Post-Transformer systems are especially interesting in voice, driving, and interactive media.

What to watch. Whether the latency and real-time advantages that win in voice and driving generalize into the larger consumer surfaces, especially interactive media, where a real-time world model would be a genuinely new category rather than a faster version of an old one.

1d18231

Rohan Paul@rohanpaul_ai

🧵 4/8 Region 2 is the domain specialist region, which means current-paradigm, vertical Foundation Model makers.

These labs use familiar transformer or diffusion foundations, but they focus hard on 1 domain where a general model may not be good enough.

Poolside focuses on coding, ElevenLabs and Deepgram focus on voice, Suno focuses on music, Black Forest Labs and Midjourney focus on image generation, Runway and Luma focus on video, and companies like Chai Discovery, Xaira, Generate Biomedicines, Isomorphic Labs, and Profluent focus on biology.

The bet here is that deep domain data, trusted evaluation, and workflow fit can beat a broader model inside a narrow market.

1d17731

Manas Chopra@themanasai

@rohanpaul_ai IG @NEA recently shared a useful list too recently. But this thread is a gem IMO!

10h1621

Rimsha Bhardwaj@heyrimsha

@rohanpaul_ai This is a map that makes the model race look like a systems war.

17h1001

Owlfy.ai@Owlfy_ai

@rohanpaul_ai Transformers took off thanks to scalable attention and a huge ecosystem behind them. Which new architecture do you think actually has a shot at matching that—SSMs, hybrids, or something else?

14h241

Jasper 🌰@building BBX@bbxjasper

@rohanpaul_ai The tell is how much of the recent jump came from inference-time compute and better data pipelines, not a genuinely new architecture. "Beat the transformer" has been the pitch for years now. Which challenger would you actually put money on outgrowing it?

23h191

atypica.AI@atypica_AI

@rohanpaul_ai Thanks a lot for your sharing; I really like your content.

I would love to develop a paid collaboration with you. Could you please check your direct message?

Thank you very much.

13h23

Mudit Srivastava@muditjps

@ajs6888 @rohanpaul_ai 完全同意！

5h51

Tom Kornblit@tomkornblit

@rohanpaul_ai For enterprise-grade systems, the 'operating rhythm' isn't just about compute efficiency—it’s about moving from 'model-centric' development to 'governance-centric' architecture. We’re building LGM (Large Governance Models) not to replace the foundation, but to constrain it.

11h7

Nexus@nexusxrobotics

@rohanpaul_ai architecture shift maps directly to our shared observation spaces. We route policies across embodiments only when the action representations line up.

19h1