11h ago

Z.ai deploys ZCube network architecture for prefill-decode disaggregated LLM inference clusters, cutting switch and optical module CapEx by 33 percent and raising GPU throughput 15 percent.

P99 TTFT latency fell 40.6 percent with the flattened topology.

0
Original post

http://x.com/i/article/2057206923208884224

2:47 PM · May 20, 2026 View on X
Z.ai deploys ZCube network architecture for prefill-decode disaggregated LLM inference clusters, cutting switch and optical module CapEx by 33 percent and raising GPU throughput 15 percent. · Digg