/AI18h ago

MLX-VLM v0.6.0 Adds Agent APIs, Speculative Decoding, New Models

554795928050.5K

Original posts

Reposts

#813

Original post

Awni Hannun#813

Prince Canuma@Prince_Canuma

Today we're shipping our biggest MLX-VLM release yet: v0.6.0

...and we are raising 💸

This one's about turning your Apple devices into real local agent machines. From your desk to your pocket.

What's new:

⚡ Speculative decoding everywhere — Gemma 4 EAGLE3 + DFlash, Qwen MTP, DeepSeek V4 MTP. Faster tokens, less waiting.

🤖 Agent-ready server — native Anthropic /v1/messages API, stateful /v1/responses, tool calls, Codex context budgets. Plug Claude Code & Codex straight into local models.

👁️ New models galore — DeepSeek V4, ZAYA1-VL, MiniCPM-V 4.6, LFM2 MoE, Step-3.7 Flash, Laguna + more.

🎨 Image gen & editing — FLUX.2 (base + klein), PrismML Bonsai.

🔊 Audio in — Qwen3 Omni, Gemma 4 audio, base64 chat audio.

🧮 TurboQuant KV cache — RHT-correct fast paths for leaner memory.

📦 Modular server, better metrics, cleaner streaming.

Run real agents on the hardware already in your hands.

Github: http://github.com/Blaizzy/mlx-vlm

1:14 PM · Jun 1, 2026 · 50.5K Views

/AI18h ago

MLX-VLM v0.6.0 Adds Agent APIs, Speculative Decoding, New Models

--0--

Original posts

Reposts

#813

Original post

Awni Hannun#813

Prince Canuma@Prince_Canuma

Today we're shipping our biggest MLX-VLM release yet: v0.6.0

...and we are raising 💸

This one's about turning your Apple devices into real local agent machines. From your desk to your pocket.

What's new:

⚡ Speculative decoding everywhere — Gemma 4 EAGLE3 + DFlash, Qwen MTP, DeepSeek V4 MTP. Faster tokens, less waiting.

🤖 Agent-ready server — native Anthropic /v1/messages API, stateful /v1/responses, tool calls, Codex context budgets. Plug Claude Code & Codex straight into local models.

👁️ New models galore — DeepSeek V4, ZAYA1-VL, MiniCPM-V 4.6, LFM2 MoE, Step-3.7 Flash, Laguna + more.

🎨 Image gen & editing — FLUX.2 (base + klein), PrismML Bonsai.

🔊 Audio in — Qwen3 Omni, Gemma 4 audio, base64 chat audio.

🧮 TurboQuant KV cache — RHT-correct fast paths for leaner memory.

📦 Modular server, better metrics, cleaner streaming.

Run real agents on the hardware already in your hands.

Github: http://github.com/Blaizzy/mlx-vlm

1:14 PM · Jun 1, 2026 · 50.5K Views

Sentiment

Many users praised the MLX-VLM v0.6.0 release for adding local agent tools and speculative decoding on Apple devices, calling the work massive and exciting for enabling better local AI apps.

Pos

100.0%

Neg

0.0%

28 comments with sentiment.

Cluster Engagement

Sentiment

Sentiment unavailable for this story.

Cluster Engagement

Views

Comments

Reposts

Bookmarks

Expand data

Posts from X

Most Activity

No ranked X posts are available for this story yet.