7h ago

AMD MI355 Delivers 40% Cost Savings Versus NVIDIA B200 On FP8 Serving

0
Original post

AMD ALERT 🚀 MI355 is now 40% cheaper than B200 on GLM5 architecture for Single Node serving FP8 14 weeks after the initial launch of GLM5 on both non-MTP & MTP with spec decode for SGLang v0.12 for both CUDA & ROCm.  SPEED IS THE MOAT!! Great work to @AnushElangovan, @roaner, HaiShaw & his team! Next step is for MI355X to catch up to CUDA when composing production inference optimizations like FP4 & on distributed inferencing where you can gang up MI355 boxes such that per GPU performance goes up thus the cost per million tokens goes down.

10:01 AM · May 19, 2026 View on X