/Tech3h ago

Expert Questions if New Architectures Can Slash LLM Training Costs 100x

39011.3K
Original post

How much of the cost of training LLMs (and alike) is tied to Transformers and its variants? Is there any reason to believe/expect that we can have an architecture that is 2-3 orders of magnitude cheaper with a similar behaviour? Or is there any fundamental limit?

5:12 PM · Jun 10, 2026 · 792 Views
Sentiment
Sentiment building, check back later.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS351LIKES3REPLIES1

I am not talking about sample efficiency or new capabilities. Just the compute cost.

Of course, the cost depends on the hardware. The question can be relaxed: If we are allowed to change the hardware minimally (*), can we come up with a much cheaper architecture?

How much of the cost of training LLMs (and alike) is tied to Transformers and its variants? Is there any reason to believe/expect that we can have an architecture that is 2-3 orders of magnitude cheaper with a similar behaviour? Or is there any fundamental limit?

3hViews 351Likes 3Bookmarks 0

(*) By minimally, I mean something that can be designed and mass-produced by the current chip makers.

P.S: I am not following the architecture design efforts, so this question might have a simple answer. I don't want to ask ChatGPT either, at least yet.

I am not talking about sample efficiency or new capabilities. Just the compute cost.

Of course, the cost depends on the hardware. The question can be relaxed: If we are allowed to change the hardware minimally (*), can we come up with a much cheaper architecture?

3hViews 221Likes 2Bookmarks 0