3h ago

Unsloth AI releases Multi-Token Prediction GGUF files for Qwen3.6 models

671.1K11168946.9K

——0——

Unsloth AI released GGUF files for the Qwen3.6 family that incorporate Multi-Token Prediction optimizations. The files deliver 1.4 to 2.2 times faster generation with no accuracy loss. The Qwen3.6-27B model reaches 160 tokens per second while the 35B-A3B variant hits 240 tokens per second, both runnable locally on 18 GB RAM. The release follows the llama.cpp MTP merge on 16 May 2026 and preserves the models' 262144-token context length.

Original post

#772@DANIELHANCHENOP

Unsloth AI@UNSLOTHAI

Qwen3.6 now runs 2x faster with MTP GGUFs! Run locally on just 18GB RAM. ⚡️ MTP enables Qwen3.6 to generate ~1.4–2.2× faster with no accuracy change. Qwen3.6-27B MTP runs at 160 tokens/s. 35B-A3B reaches 240 t/s. GGUFs: https://huggingface.co/unsloth/Qwen3.6-27B-MTP-GGUF Guide: https://unsloth.ai/docs/models/qwen3.6#mtp-guide

6:40 AM · May 18, 2026

Reposted by

#772@DANIELHANCHEN

ORIGINAL POST

#781Georgi Gerganov@GGERGANOV

llama.cpp adds MTP for the Qwen3.6 family

This is a significant milestone for the local AI ecosystem. The performance jump with these changes is massive and elevates local inference on commodity hardware further.

Special thanks to Aman Gupta for leading this development!

github.com

/ggml-org/llama.cpp/pull/22673

3:07 PM · May 18, 2026 · 9.9K Views

Unsloth AI releases Multi-Token Prediction GGUF files for Qwen3.6 models

Sentiment

Cluster engagement