that this works okay vs learned routing is indictment enough
Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex
I still wish we had something more globally aware than these routers MoEs are frustrating. What do you mean loss and knowledge scale with total params and "intelligence" with active? Wtf is non-knowledge-based intelligence in an LLM? That's not true humanlike sparsity.
1:52 PM · Jul 4, 2026 · 1.1K Views
