Today we're shipping our biggest MLX-VLM release yet: v0.6.0
...and we are raising ๐ธ
This one's about turning your Apple devices into real local agent machines. From your desk to your pocket.
What's new:
โก Speculative decoding everywhere โ Gemma 4 EAGLE3 + DFlash, Qwen MTP, DeepSeek V4 MTP. Faster tokens, less waiting.
๐ค Agent-ready server โ native Anthropic /v1/messages API, stateful /v1/responses, tool calls, Codex context budgets. Plug Claude Code & Codex straight into local models.
๐๏ธ New models galore โ DeepSeek V4, ZAYA1-VL, MiniCPM-V 4.6, LFM2 MoE, Step-3.7 Flash, Laguna + more.
๐จ Image gen & editing โ FLUX.2 (base + klein), PrismML Bonsai.
๐ Audio in โ Qwen3 Omni, Gemma 4 audio, base64 chat audio.
๐งฎ TurboQuant KV cache โ RHT-correct fast paths for leaner memory.
๐ฆ Modular server, better metrics, cleaner streaming.
Run real agents on the hardware already in your hands.
Github: http://github.com/Blaizzy/mlx-vlm