/AI18h ago

MLX-VLM v0.6.0 Adds Agent APIs, Speculative Decoding, New Models

--0--
Original posts
Reposts
Original postAwni Hannun#813
Prince Canuma@Prince_Canuma

Today we're shipping our biggest MLX-VLM release yet: v0.6.0

...and we are raising ๐Ÿ’ธ

This one's about turning your Apple devices into real local agent machines. From your desk to your pocket.

What's new:

โšก Speculative decoding everywhere โ€” Gemma 4 EAGLE3 + DFlash, Qwen MTP, DeepSeek V4 MTP. Faster tokens, less waiting.

๐Ÿค– Agent-ready server โ€” native Anthropic /v1/messages API, stateful /v1/responses, tool calls, Codex context budgets. Plug Claude Code & Codex straight into local models.

๐Ÿ‘๏ธ New models galore โ€” DeepSeek V4, ZAYA1-VL, MiniCPM-V 4.6, LFM2 MoE, Step-3.7 Flash, Laguna + more.

๐ŸŽจ Image gen & editing โ€” FLUX.2 (base + klein), PrismML Bonsai.

๐Ÿ”Š Audio in โ€” Qwen3 Omni, Gemma 4 audio, base64 chat audio.

๐Ÿงฎ TurboQuant KV cache โ€” RHT-correct fast paths for leaner memory.

๐Ÿ“ฆ Modular server, better metrics, cleaner streaming.

Run real agents on the hardware already in your hands.

Github: http://github.com/Blaizzy/mlx-vlm

1:14 PM ยท Jun 1, 2026 ยท 50.5K Views
Sentiment
Sentiment unavailable for this story.
Cluster Engagement
-
Views
-
Comments
-
Reposts
-
Bookmarks
Expand data
Posts from X
Most Activity
Most ActivityTimeline
No ranked X posts are available for this story yet.