11h ago

Z.ai deploys ZCube network architecture for prefill-decode disaggregated LLM inference clusters, cutting switch and optical module CapEx by 33 percent and raising GPU throughput 15 percent.

P99 TTFT latency fell 40.6 percent with the flattened topology.

175718434367.5K

——0——

Original post

http://x.com/i/article/2057206923208884224

Cluster engagement