I wonder what people really think about MoEs? It's ok, the voting is anonymous, you can select the option that you really think, deep inside your heart.
Meta's Lucas Beyer launches anonymous MoE poll, sparking debate over parameter scaling and routing efficiency
Story Overview
Meta AI researcher Lucas Beyer dropped an anonymous X poll to surface unfiltered opinions on Mixture of Experts models, quickly igniting replies that challenge whether intelligence truly tracks only the active parameters while total parameter counts drive loss and knowledge.
Active versus total parameters remain unsettled
Replies stress that loss curves scale with every parameter yet performance narratives often cite only the routed subset, leaving routing efficiency and any non-knowledge intelligence open to doubt.
Beyer already flagged MoE scaling limits
Months earlier the same researcher noted that MoE is not actually scaling pilled, framing the poll as a follow-up probe rather than a sudden shift in view.
Positive users praise Mixture of Experts models for their elegant loss-free balancing, token-level routing, and faster loss descent, while some negative users dismiss them as ugly hacks or just ensembles with better PR.
No Digg Deeper questions have been answered for this story yet.
Most Activity
I still wish we had something more globally aware than these routers MoEs are frustrating. What do you mean loss and knowledge scale with total params and "intelligence" with active? Wtf is non-knowledge-based intelligence in an LLM? That's not true humanlike sparsity.
I wonder what people really think about MoEs? It's ok, the voting is anonymous, you can select the option that you really think, deep inside your heart.
all model architectures live on a spectrum, ask yourself if MoEs are a step in the right direction
I wonder what people really think about MoEs? It's ok, the voting is anonymous, you can select the option that you really think, deep inside your heart.

that this works okay vs learned routing is indictment enough
@giffmana my infra guy handles it
I wonder what people really think about MoEs? It's ok, the voting is anonymous, you can select the option that you really think, deep inside your heart.
@giffmana They just need more merch int love.
I wonder what people really think about MoEs? It's ok, the voting is anonymous, you can select the option that you really think, deep inside your heart.

@teortaxesTex if the routing in the first few layers is *by construction* almost totally determined by the token embeddings, might as well use an arbitrary router instead of a learned router *for those*. or something!
@giffmana I hate them so much that I think you should just train smaller models
I wonder what people really think about MoEs? It's ok, the voting is anonymous, you can select the option that you really think, deep inside your heart.

@giffmana Let's be honest, if it wasn't a "Yuck", we would have seen more in vision rather then V-MoE 🙃

@teortaxesTex "in the initial several layers" actually wait this makes sense algorithmically with respect to the definition of a resnet, right? so many of the parameters in a resnet are wasted on conditioning embeddings in residualspace then reconditioning for-unembeddings...

@giffmana loss free balancing is so elegant it makes MoEs elegant

@giffmana Is there a "there's something missing that will make it not yuck" option?

@sameQCU @teortaxesTex if you want to get rid of this you can use incredibly large n-gram embedding tables and superword input tokens but everybody is a coward. here is a random link from online https://arxiv.org/abs/2502.01637

@tenderizzation i like moe because it goes vroom and also improves loss descent on wallclock matched comparisons

@tenderizzation i want to have my cake and eat it too

@tenderizzation MoEs with shared experts are great, shared experts can be loaded while router is computed. bubbles are skill issues

@rockstarondeck4 @teortaxesTex Many of my followers have already joined our WhatsApp group.!
Get free real-time trading alerts, investment strategies, and market forecast analysis.
Join the group 👇
➡️ Send “Join” to this WhatsApp number +12025567649
WhatsApp link👉🔗http://api.whatsapp.com/send?phone=12025567649&text=join

@rockstarondeck4 @teortaxesTex Many of my followers have already joined our WhatsApp group.!
Get free real-time trading alerts, investment strategies, and market forecast analysis.
Join the group 👇
➡️ Send “Join” to this WhatsApp number +12025567649
WhatsApp link👉🔗http://api.whatsapp.com/send?phone=12025567649&text=join

@giffmana Conceptually elegant, but practically frustrating

@giffmana @CSProfKGD MoEs are a beautiful idea, but the way routing and balancing are implemented in real models makes me cry

@giffmana surprised there isn't more love for MoEs i think they're beautifuk