not sure how you can say the first major innovation since adam was muon and not shampoo. although I'll admit that muon is much simpler to implement in practice!
unless you are saying muon is a special case of shampoo with b2=0?
Keller's approach (ultra-fast iteration) is promising because it lead to the first major innovation since Adam (Muon). CIFAR was only 2 seconds to train end-to-end which meant he could try many ideas fast. His first unoptimized Muon run was something like 30 seconds but it was clear it was onto something due to large drop in steps