Qwen 3.6 27B on a MacBook Pro M5 Max 64GB hitting 34tokens per sec, locally with atomic[.]chat
90% acceptance rate, i.e. most draft tokens matched what the main model would have produced, so the speed gain is not from skipping quality checks, but from avoiding repeated full-cost decoding work.
TurboQuant and GGUF handle the storage and runtime side: the model is compressed enough to run locally, while llama.cpp can feed Apple Silicon efficiently instead of waiting on huge weight movement.
Pretty serious local-inference result, changes what “laptop AI” can feel like.
Multi-Token Prediction (MTP) for Qwen on LLaMA.cpp!
+40% performance! 90% acceptance rate. Running locally on a MacBook Pro M5 Max 64GB
We patched LLaMA.cpp, quantized Qwen 3.6 27B into GGUF format with TurboQuant and shipped MTP drafts on top. Benchmark, Source code & models👇







