7h ago

Unsloth AI releases Multi-Token Prediction GGUF files for Qwen3.6 models

0

Unsloth AI released GGUF files for the Qwen3.6 family that incorporate Multi-Token Prediction optimizations. The files deliver 1.4 to 2.2 times faster generation with no accuracy loss. The Qwen3.6-27B model reaches 160 tokens per second while the 35B-A3B variant hits 240 tokens per second, both runnable locally on 18 GB RAM. The release follows the llama.cpp MTP merge on 16 May 2026 and preserves the models' 262144-token context length.

Original post

Qwen3.6 now runs 2x faster with MTP GGUFs! Run locally on just 18GB RAM. ⚡️ MTP enables Qwen3.6 to generate ~1.4–2.2× faster with no accuracy change. Qwen3.6-27B MTP runs at 160 tokens/s. 35B-A3B reaches 240 t/s. GGUFs: https://huggingface.co/unsloth/Qwen3.6-27B-MTP-GGUF Guide: https://unsloth.ai/docs/models/qwen3.6#mtp-guide

6:40 AM · May 18, 2026 View on X
Reposted by

finally faster Qwen3.6 models with MTP support ⚡️

brb updating my Pi & Hermes setup 🤝

Georgi GerganovGeorgi Gerganov@ggerganov

llama.cpp adds MTP for the Qwen3.6 family This is a significant milestone for the local AI ecosystem. The performance jump with these changes is massive and elevates local inference on commodity hardware further. Special thanks to Aman Gupta for leading this development! https://github.com/ggml-org/llama.cpp/pull/22673

3:07 PM · May 18, 2026 · 48.2K Views
6:59 PM · May 18, 2026 · 2.4K Views

llama.cpp adds MTP for the Qwen3.6 family

This is a significant milestone for the local AI ecosystem. The performance jump with these changes is massive and elevates local inference on commodity hardware further.

Special thanks to Aman Gupta for leading this development!

github.com
/ggml-org/llama.cpp/pull/22673
3:07 PM · May 18, 2026 · 48.2K Views
Unsloth AI releases Multi-Token Prediction GGUF files for Qwen3.6 models · Digg