8h ago

Unsloth AI releases MTP-optimized GGUF files for Qwen3.6-27B and 35B-A3B models that deliver 1.4 to 2.2 times faster generation with no accuracy loss on as little as 18 GB RAM

1302.3K2641.4K133.5K

——0——

llama.cpp merged native MTP support on 16 May 2026.

Original post

#772@DANIELHANCHENOP

Unsloth AI@UNSLOTHAI

Qwen3.6 now runs 2x faster with MTP GGUFs! Run locally on just 18GB RAM. ⚡️ MTP enables Qwen3.6 to generate ~1.4–2.2× faster with no accuracy change. Qwen3.6-27B MTP runs at 160 tokens/s. 35B-A3B reaches 240 t/s. GGUFs: https://huggingface.co/unsloth/Qwen3.6-27B-MTP-GGUF Guide: https://unsloth.ai/docs/models/qwen3.6#mtp-guide

6:40 AM · May 18, 2026

Reposted by

#772@DANIELHANCHEN

#675@MERVENOYANN

QUOTE POST

#675merve@MERVENOYANN

finally faster Qwen3.6 models with MTP support ⚡️

brb updating my Pi & Hermes setup 🤝

Georgi Gerganov@ggerganov

llama.cpp adds MTP for the Qwen3.6 family This is a significant milestone for the local AI ecosystem. The performance jump with these changes is massive and elevates local inference on commodity hardware further. Special thanks to Aman Gupta for leading this development! https://github.com/ggml-org/llama.cpp/pull/22673

3:07 PM · May 18, 2026 · 57.2K Views

6:59 PM · May 18, 2026 · 2.9K Views

ORIGINAL POST

#781Georgi Gerganov@GGERGANOV

llama.cpp adds MTP for the Qwen3.6 family

This is a significant milestone for the local AI ecosystem. The performance jump with these changes is massive and elevates local inference on commodity hardware further.

Special thanks to Aman Gupta for leading this development!

github.com

/ggml-org/llama.cpp/pull/22673

3:07 PM · May 18, 2026 · 57.2K Views

Unsloth AI releases MTP-optimized GGUF files for Qwen3.6-27B and 35B-A3B models that deliver 1.4 to 2.2 times faster generation with no accuracy loss on as little as 18 GB RAM

Cluster engagement

Sentiment