/Tech13h ago

Robert Nishihara, Ray co-creator, highlights benchmarks showing up to 67% cost savings from prefill-decode disaggregation on AMD MI325X GPUs

Production data reveals the optimization is often over-applied.

75091713.1K

#811

Original post

Robert Nishihara@robertnishihara#811inTech

Impressive cost optimization for LLM inference using AMD GPUs.

Anyscale@anyscalecompute

Save 67% with prefill-decode disaggregation using Ray + vLLM on AMD GPUs.

https://www.anyscale.com/blog/ray-vllm-prefill-decode-disaggregation-amd-mi325x-67-percent-savings

12:51 PM · Jun 15, 2026 · 3.4K Views

Sentiment

Sentiment building, check back later.

Cluster Engagement

Posts from X

Most Activity

RETWEETS8

kourosh hakhamaneshi@CyrusHakha

One pattern we keep seeing with customers serving LLMs at scale:

Prefill-decode disaggregation is often treated like a magic wand. But the reality is more nuanced.

So we wrote down the core insights for when PD helps, when it does not, and validated them on AMD + vLLM — where the PD path has been much less paved. 🧵

16h9.8K2712