Unsloth AI releases Multi-Token Prediction GGUF files for Qwen3.6 models
Unsloth AI released GGUF files for the Qwen3.6 family that incorporate Multi-Token Prediction optimizations. The files deliver 1.4 to 2.2 times faster generation with no accuracy loss. The Qwen3.6-27B model reaches 160 tokens per second while the 35B-A3B variant hits 240 tokens per second, both runnable locally on 18 GB RAM. The release follows the llama.cpp MTP merge on 16 May 2026 and preserves the models' 262144-token context length.
llama.cpp adds MTP for the Qwen3.6 family
This is a significant milestone for the local AI ecosystem. The performance jump with these changes is massive and elevates local inference on commodity hardware further.
Special thanks to Aman Gupta for leading this development!