8h ago

Unsloth AI releases MTP-optimized GGUF files for Qwen3.6-27B and 35B-A3B models that deliver 1.4 to 2.2 times faster generation with no accuracy loss on as little as 18 GB RAM

0

llama.cpp merged native MTP support on 16 May 2026.

Original post

Qwen3.6 now runs 2x faster with MTP GGUFs! Run locally on just 18GB RAM. ⚡️ MTP enables Qwen3.6 to generate ~1.4–2.2× faster with no accuracy change. Qwen3.6-27B MTP runs at 160 tokens/s. 35B-A3B reaches 240 t/s. GGUFs: https://huggingface.co/unsloth/Qwen3.6-27B-MTP-GGUF Guide: https://unsloth.ai/docs/models/qwen3.6#mtp-guide

6:40 AM · May 18, 2026 View on X
Reposted by

finally faster Qwen3.6 models with MTP support ⚡️

brb updating my Pi & Hermes setup 🤝

Georgi GerganovGeorgi Gerganov@ggerganov

llama.cpp adds MTP for the Qwen3.6 family This is a significant milestone for the local AI ecosystem. The performance jump with these changes is massive and elevates local inference on commodity hardware further. Special thanks to Aman Gupta for leading this development! https://github.com/ggml-org/llama.cpp/pull/22673

3:07 PM · May 18, 2026 · 57.2K Views
6:59 PM · May 18, 2026 · 2.9K Views

llama.cpp adds MTP for the Qwen3.6 family

This is a significant milestone for the local AI ecosystem. The performance jump with these changes is massive and elevates local inference on commodity hardware further.

Special thanks to Aman Gupta for leading this development!

github.com
/ggml-org/llama.cpp/pull/22673
3:07 PM · May 18, 2026 · 57.2K Views