/AI15h ago

Intel Xeon CPUs Offload Vision Encoding to Speed VLM Serving

4443182.8K

Original posts

Reposts

#603

Original post

Ying Sheng#603

LMSYS Org@lmsysorg

🚀 New blog: Heterogeneous CPU + GPU EPD Disaggregation to Boost VLM Serving, with Intel Xeon CPUs offloading vision encoding to cut TTFT and boost throughput.

Vision encoding is the bottleneck in image-heavy VLM serving. Offloading it to CPUs changes that. By using SGLang EPD disaggregation + Dynamo device-aware weighted router + @Intel AMX on Xeon 6747P, we achieved: ✅ 1.2-1.3× lower P99 TTFT & higher request throughput ✅ 1.3-30× lower P99 TPOT ✅ Extra ROI on top of pure GPU EPD disaggregation, at near-zero added cost

Thanks to @inteldevs for the collaboration on this!

10:26 AM · Jun 1, 2026 · 2.8K Views

/AI15h ago

Intel Xeon CPUs Offload Vision Encoding to Speed VLM Serving

--0--

Original posts

Reposts

#603

Original post

Ying Sheng#603

LMSYS Org@lmsysorg

🚀 New blog: Heterogeneous CPU + GPU EPD Disaggregation to Boost VLM Serving, with Intel Xeon CPUs offloading vision encoding to cut TTFT and boost throughput.

Thanks to @inteldevs for the collaboration on this!

10:26 AM · Jun 1, 2026 · 2.8K Views

Sentiment

Users praise LMSYS's CPU-GPU vision encoding disaggregation for VLM serving as a blueprint for future frameworks that will eliminate wasteful compute on vision encoders.

Pos

100.0%

Neg

0.0%

2 comments with sentiment.

Cluster Engagement

Sentiment

Sentiment unavailable for this story.

Cluster Engagement

Views

Comments

Reposts

Bookmarks

Expand data

Posts from X

Most Activity

No ranked X posts are available for this story yet.