I wonder what people really think about MoEs? It's ok, the voting is anonymous, you can select the option that you really think, deep inside your heart.
Lucas Beyer questions MoE infrastructure scaling, citing finger-pointing over model routing imbalances and load balancing failures
Story Overview
Lucas Beyer is using an anonymous poll to surface unfiltered industry views on Mixture of Experts models, zeroing in on the messy reality of routing imbalances and load balancing failures that spark finger-pointing between infrastructure and balancing teams.
Blame Travels Between Teams
Beyer notes that when MoE deployments stumble, the load balance specialist often redirects criticism toward the infra group for inadequate imbalance handling, underscoring how deployment success hinges on coordination that frequently breaks down.
Real-World Tradeoffs Stay Unresolved
Replies echo that MoEs feel conceptually clean yet frustrating once routing and balancing enter the picture, with some voices quietly preferring smaller dense models, though Beyer has not released any poll tallies or further analysis.
Positive users praise Mixture of Experts for their conceptual elegance, efficient token routing, and local serving benefits, while negative users call them ugly hacks, overhyped ensembles, or frustrating in practice.
No Digg Deeper questions have been answered for this story yet.
Most Activity
If you have this opinion, which i believe to be somewhat common. Then
What does your load balance guy say to this? Or he just blames your infra guy too, for not handling imbalance well?
MoEs are elegant when my infra guy (well, me) can handle them well ;)

@giffmana Is there a "there's something missing that will make it not yuck" option?

@giffmana I hate them so much that I think you should just train smaller models

@giffmana my infra guy handles it

@giffmana loss free balancing is so elegant it makes MoEs elegant

@cs_serdar damn no, and the last two are really too similar. But the like button on your reply is that option now!

@giffmana They just need more merch int love.

@giffmana I'm still waiting for your hot takes after my last year's talk on MoEs. Well, I guess I probably heard some of them during the time we worked together. :p

@giffmana @CSProfKGD MoEs are a beautiful idea, but the way routing and balancing are implemented in real models makes me cry

@giffmana Let's be honest, if it wasn't a "Yuck", we would have seen more in vision rather then V-MoE 🙃

@giffmana What’s the source of the purist obsession for dense? I find it really surprising

@giffmana Moses are extremely ugly hack but necessary because we keep trying to model a sparse object with dense matrices

@giffmana @m_sirovatka handles it

@giffmana Conceptually elegant, but practically frustrating

@Dorialexander More merch? Like t-shirts and mugs and stuff?

@giffmana I chose 1, not because I think MoE is elegant, but
1. Sparse arch is useful
2. Sparse optimization is discrete and inherently NP-Hard, so you could not propose elegant solution anyway. Thus, all heuristics are equally dirty (unless u believe in P = NP)

@giffmana surprised there isn't more love for MoEs i think they're beautifuk

@leothecurious Ok, I'll be the one spilling the beans: your infra guy hates you. He just smiles out of politeness.

@_ueaj @giffmana QB 😍

@giffmana In Aristotle's terms: MoE model is my friend, but Dense model is a better friend !