/AI15h ago

Intel Xeon CPUs Offload Vision Encoding to Speed VLM Serving

--0--
Original posts
Reposts
Original postYing Sheng#603
LMSYS Org@lmsysorg

šŸš€ New blog: Heterogeneous CPU + GPU EPD Disaggregation to Boost VLM Serving, with Intel Xeon CPUs offloading vision encoding to cut TTFT and boost throughput.

Vision encoding is the bottleneck in image-heavy VLM serving. Offloading it to CPUs changes that. By using SGLang EPD disaggregation + Dynamo device-aware weighted router + @Intel AMX on Xeon 6747P, we achieved: āœ… 1.2-1.3Ɨ lower P99 TTFT & higher request throughput āœ… 1.3-30Ɨ lower P99 TPOT āœ… Extra ROI on top of pure GPU EPD disaggregation, at near-zero added cost

Thanks to @inteldevs for the collaboration on this!

10:26 AM Ā· Jun 1, 2026 Ā· 2.8K Views
Sentiment
Sentiment unavailable for this story.
Cluster Engagement
-
Views
-
Comments
-
Reposts
-
Bookmarks
Expand data
Posts from X
Most Activity
Most ActivityTimeline
No ranked X posts are available for this story yet.