/Tech31d ago

Researchers Debate Transformer Vs Post-Transformer Architectures In Boxing Ring

1111825123167.8K

#1257

Original post

Rohan Paul@rohanpaul_ai#1257inTech

This is probably the most entertaining way to understand one of AI’s hardest AI debates.

Transformer vs Post-Transformer, argued by leading researchers, inside a real physical boxing ring.

Both technically deep and genuinely entertaining.

I was glued for the entire 1 hour 20 minutes. So many super cool points to learn.

🥊 Transformers

- Transformers still own the present because they work at scale. They are simple, trainable, hardware-friendly, and already power the strongest AI systems we use today.

- The Transformer is basically a memory machine. It stores information as keys and values, then uses attention to pull back the most useful parts when answering.

- The real Transformer advantage is not just “attention.” The bigger advantage is that it fits modern hardware extremely well, so it can process huge batches of tokens fast.

- Scaling is still the brutal rule. If you give Transformers more compute, more data, and more parameters, they usually keep getting better. Any Post-Transformer architecture has to scale just as well, or better.

- It is not enough to look clever on small tests, because the real question is whether it improves faster than Transformers when scaled up.

- A replacement cannot be slightly better. Because the whole AI stack is already built around Transformers, the next architecture may need to be around 10x better to force everyone to switch.

- Transformers are powerful, but they may be brute force. A human does not need to read the entire internet many times to become smart, but current LLMs need enormous data and compute.

🥊 Post-Transformer

- Post-Transformer people are not saying Transformers are bad. They are saying Transformers may be the best current tool, not the final form of machine intelligence.

- The biggest Post-Transformer target is native reasoning and continual learning. Today’s LLM reasoning often feels like text-based step-by-step work added on top, instead of thinking happening naturally inside the model.

- Latent reasoning is one possible next step. That means the model reasons inside its own hidden internal space, instead of writing every thought out as words.

- Continual learning is still a major weakness. Humans keep learning from experience, but most Transformer-based models are trained, frozen, and then only adapt inside the prompt.

- Long context is not the same as real memory. A model can read a huge prompt, but that is different from building a life history, learning from mistakes, and updating beliefs over time.

- The future may be hybrid, not a clean replacement. Transformers may stay as 1 building block while newer systems add better memory, better reasoning, and better learning loops.

- The most interesting possibility is that Transformers may help discover their own successor. AI agents are already getting better at research and coding, so the next architecture may come from AI-assisted architecture search.

-------

- Benchmarks are a problem. Many public benchmarks are easy to game, so they may show leaderboard strength without proving deeper intelligence.

- Perplexity is still probably a great metric to evaluate frontier models,, because it tests prediction quality.

---

Overall, Transformers continue to dominate, but the frontier is clearly widening.

Pathway’s BDH (Dragon Hatchling — brain-inspired reasoning architecture), Sakana AI’s CTMs (Continuous Thought Machines — models that think over time), and Liquid AI’s LFMs (Liquid Foundation Models — efficient multimodal foundation models) - all of these show how the frontier is expanding.

--- From “Pathway (pathway[.]com)” Youtube channel (link in comment)

@zuzanna_pathway

7:02 AM · May 29, 2026 · 87.1K Views

Sentiment

Users praise the boxing ring debate format for researchers discussing transformer versus post-transformer architectures as a super cool idea worth kudos.

Pos

100.0%

Neg

0.0%

2 comments with sentiment.

Cluster Engagement

Digg Deeper

No Digg Deeper questions have been answered for this story yet.

YOUTUBEVia

#1257

Posts from X

Most Activity

VIEWS76KBOOKMARKS13LIKES12REPLIES2

Pathway (www.pathway.com)@pathway_com

Here's a great starting point for you to understand the Transformer vs Post Transformer Debate convened by @zuzanna_pathway!

Credits @rohanpaul_ai.

Rohan Paul@rohanpaul_ai

This is probably the most entertaining way to understand one of AI’s hardest AI debates.

Transformer vs Post-Transformer, argued by leading researchers, inside a real physical boxing ring.

Both technically deep and genuinely entertaining.

I was glued for the entire 1 hour 20 minutes. So many super cool points to learn.

🥊 Transformers

- Transformers still own the present because they work at scale. They are simple, trainable, hardware-friendly, and already power the strongest AI systems we use today.

- The Transformer is basically a memory machine. It stores information as keys and values, then uses attention to pull back the most useful parts when answering.

- The real Transformer advantage is not just “attention.” The bigger advantage is that it fits modern hardware extremely well, so it can process huge batches of tokens fast.

- It is not enough to look clever on small tests, because the real question is whether it improves faster than Transformers when scaled up.

- A replacement cannot be slightly better. Because the whole AI stack is already built around Transformers, the next architecture may need to be around 10x better to force everyone to switch.

- Transformers are powerful, but they may be brute force. A human does not need to read the entire internet many times to become smart, but current LLMs need enormous data and compute.

🥊 Post-Transformer

- Post-Transformer people are not saying Transformers are bad. They are saying Transformers may be the best current tool, not the final form of machine intelligence.

- Latent reasoning is one possible next step. That means the model reasons inside its own hidden internal space, instead of writing every thought out as words.

- Continual learning is still a major weakness. Humans keep learning from experience, but most Transformer-based models are trained, frozen, and then only adapt inside the prompt.

- Long context is not the same as real memory. A model can read a huge prompt, but that is different from building a life history, learning from mistakes, and updating beliefs over time.

- The future may be hybrid, not a clean replacement. Transformers may stay as 1 building block while newer systems add better memory, better reasoning, and better learning loops.

-------

- Benchmarks are a problem. Many public benchmarks are easy to game, so they may show leaderboard strength without proving deeper intelligence.

- Perplexity is still probably a great metric to evaluate frontier models,, because it tests prediction quality.

---

Overall, Transformers continue to dominate, but the frontier is clearly widening.

--- From “Pathway (pathway[.]com)” Youtube channel (link in comment)

@zuzanna_pathway

31d76K1213

RETWEETS19

Rohan Paul@rohanpaul_ai

This is probably the most entertaining way to understand one of AI’s hardest AI debates.

Transformer vs Post-Transformer, argued by leading researchers, inside a real physical boxing ring.

Both technically deep and genuinely entertaining.

I was glued for the entire 1 hour 20 minutes. So many super cool points to learn.

🥊 Transformers

- Transformers still own the present because they work at scale. They are simple, trainable, hardware-friendly, and already power the strongest AI systems we use today.

- The Transformer is basically a memory machine. It stores information as keys and values, then uses attention to pull back the most useful parts when answering.

- The real Transformer advantage is not just “attention.” The bigger advantage is that it fits modern hardware extremely well, so it can process huge batches of tokens fast.

- It is not enough to look clever on small tests, because the real question is whether it improves faster than Transformers when scaled up.

- A replacement cannot be slightly better. Because the whole AI stack is already built around Transformers, the next architecture may need to be around 10x better to force everyone to switch.

- Transformers are powerful, but they may be brute force. A human does not need to read the entire internet many times to become smart, but current LLMs need enormous data and compute.

🥊 Post-Transformer

- Post-Transformer people are not saying Transformers are bad. They are saying Transformers may be the best current tool, not the final form of machine intelligence.

- Latent reasoning is one possible next step. That means the model reasons inside its own hidden internal space, instead of writing every thought out as words.

- Continual learning is still a major weakness. Humans keep learning from experience, but most Transformer-based models are trained, frozen, and then only adapt inside the prompt.

- Long context is not the same as real memory. A model can read a huge prompt, but that is different from building a life history, learning from mistakes, and updating beliefs over time.

- The future may be hybrid, not a clean replacement. Transformers may stay as 1 building block while newer systems add better memory, better reasoning, and better learning loops.

-------

- Benchmarks are a problem. Many public benchmarks are easy to game, so they may show leaderboard strength without proving deeper intelligence.

- Perplexity is still probably a great metric to evaluate frontier models,, because it tests prediction quality.

---

Overall, Transformers continue to dominate, but the frontier is clearly widening.

--- From “Pathway (pathway[.]com)” Youtube channel (link in comment)

@zuzanna_pathway

31d87.1K9295

Rohan Paul@rohanpaul_ai

The full video

https://www.youtube.com/watch?v=hCjoMLuCuLQ

Rohan Paul@rohanpaul_ai

This is probably the most entertaining way to understand one of AI’s hardest AI debates.

Transformer vs Post-Transformer, argued by leading researchers, inside a real physical boxing ring.

Both technically deep and genuinely entertaining.

I was glued for the entire 1 hour 20 minutes. So many super cool points to learn.

🥊 Transformers

- Transformers still own the present because they work at scale. They are simple, trainable, hardware-friendly, and already power the strongest AI systems we use today.

- The Transformer is basically a memory machine. It stores information as keys and values, then uses attention to pull back the most useful parts when answering.

- The real Transformer advantage is not just “attention.” The bigger advantage is that it fits modern hardware extremely well, so it can process huge batches of tokens fast.

- It is not enough to look clever on small tests, because the real question is whether it improves faster than Transformers when scaled up.

- A replacement cannot be slightly better. Because the whole AI stack is already built around Transformers, the next architecture may need to be around 10x better to force everyone to switch.

- Transformers are powerful, but they may be brute force. A human does not need to read the entire internet many times to become smart, but current LLMs need enormous data and compute.

🥊 Post-Transformer

- Post-Transformer people are not saying Transformers are bad. They are saying Transformers may be the best current tool, not the final form of machine intelligence.

- Latent reasoning is one possible next step. That means the model reasons inside its own hidden internal space, instead of writing every thought out as words.

- Continual learning is still a major weakness. Humans keep learning from experience, but most Transformer-based models are trained, frozen, and then only adapt inside the prompt.

- Long context is not the same as real memory. A model can read a huge prompt, but that is different from building a life history, learning from mistakes, and updating beliefs over time.

- The future may be hybrid, not a clean replacement. Transformers may stay as 1 building block while newer systems add better memory, better reasoning, and better learning loops.

-------

- Benchmarks are a problem. Many public benchmarks are easy to game, so they may show leaderboard strength without proving deeper intelligence.

- Perplexity is still probably a great metric to evaluate frontier models,, because it tests prediction quality.

---

Overall, Transformers continue to dominate, but the frontier is clearly widening.

--- From “Pathway (pathway[.]com)” Youtube channel (link in comment)

@zuzanna_pathway

31d3K911

Rohan Paul@rohanpaul_ai

🥊 In the ring:

- Łukasz Kaiser, co-inventor of the Transformer, co-author of TensorFlow, co-creator of ChatGPT, GPT-4, GPT-5, and the o1/o3 reasoning models. Researcher at OpenAI and Google Brain.

- Adrian Kosowski, co-founder and Chief Scientific Officer at Pathway, inventor of the Dragon Hatchling (BDH) architecture, and early pioneer of HNSW. PhD at 20, tenured at Inria at 23. Author of 100+ papers across graph algorithms, distributed systems, and quantum information.

- Llion Jones, co-inventor of the Transformer (yes, along with Łukasz Kaiser), but fighting for the Post-Transformer side! Co-founder and CTO of Sakana AI, former Google Brain, building nature-inspired AI systems and evolutionary model merging.

- Mathias Lechner, Co-founder and CTO of Liquid AI and Research Affiliate at MIT CSAIL. One of the minds behind Liquid Neural Networks, with award-winning work on robust and trustworthy machine learning at IST Austria.

Rohan Paul@rohanpaul_ai

The full video

https://www.youtube.com/watch?v=hCjoMLuCuLQ

31d1.6K54

Rohan Paul@rohanpaul_ai

@pathway_com @zuzanna_pathway super cool idea from your side, to bring all these leading researchers in a boxing ring.

31d1.3K2

AdsQ@AdsQnn

@pathway_com @zuzanna_pathway @rohanpaul_ai I honestly thought at first that this was going to be a confrontation with a priest after reading those posts about the Pope. 😄

31d623

Zuzanna Stamirowska@zuzanna_pathway

@AdsQnn @pathway_com @rohanpaul_ai XD I mean, it is an almost religious debate haha

31d413

AdsQ@AdsQnn

@zuzanna_pathway @pathway_com @rohanpaul_ai Definitely! But lately the Pope has been speaking quite often about AI, and the gentleman in the white and purple outfit looks a bit like a priest, so that was my first thought.😏

31d331

Zuzanna Stamirowska@zuzanna_pathway

@rohanpaul_ai @pathway_com Thanks! And honestly, kudos to them for embracing this ;-)

31d201

Pathway (www.pathway.com)@pathway_com

@rohanpaul_ai Thanks for the sharing your key observations from the debate, @rohanpaul_ai! We're glad you found it useful!

31d2