Unsloth AI releases MTP-optimized GGUF files for Qwen3.6-27B and 35B-A3B models that deliver 1.4 to 2.2 times faster generation with no accuracy loss on as little as 18 GB RAM
llama.cpp merged native MTP support on 16 May 2026.
finally faster Qwen3.6 models with MTP support ⚡️
brb updating my Pi & Hermes setup 🤝
llama.cpp adds MTP for the Qwen3.6 family This is a significant milestone for the local AI ecosystem. The performance jump with these changes is massive and elevates local inference on commodity hardware further. Special thanks to Aman Gupta for leading this development! https://github.com/ggml-org/llama.cpp/pull/22673
llama.cpp adds MTP for the Qwen3.6 family
This is a significant milestone for the local AI ecosystem. The performance jump with these changes is massive and elevates local inference on commodity hardware further.
Special thanks to Aman Gupta for leading this development!