/Tech13h ago

Unsloth AI co-founder Daniel Han releases quantized GGUFs shrinking the 1-trillion-parameter Kimi K2.7 Code model to 325 GB

It runs locally at over 40 tokens per second.

1942.1K13971549.3K

#121

Original post

Unsloth AI@UnslothAI

You can now run Kimi K2.7 Code locally! 🌘

We shrank the 1T model to 325GB (-48%) via Dynamic 2-bit where important layers are upcasted.

Run at >40 tok/s on 330GB RAM/VRAM setups.

Run full precision on 610 GB.

Guide: https://unsloth.ai/docs/models/kimi-k2.7-code GGUF: https://huggingface.co/unsloth/Kimi-K2.7-Code-GGUF

Kimi.ai@Kimi_Moonshot

🌘 Kimi-K2.7-Code, our latest coding model, is now released and open-sourced!

🔷 Improved coding & agent performance over K2.6: +21.8% on Kimi Code Bench v2, +11.0% on Program Bench, and +31.5% on MLS Bench Lite. 🔷 Reasoning efficiency: Less overthinking, with 30% lower reasoning-token usage compared to K2.6. 🔷 Long-horizon coding: Improved instruction following, higher end-to-end coding task success rates.

⚡️ 6x High-Speed Mode coming soon! 🔌 Available today via Kimi API and Kimi Code.

🔗 Kimi Code: https://kimi.com/code 🔗 API: https://platform.moonshot.ai

5:07 AM · Jun 15, 2026 · 536.6K Views

Sentiment

Positive users praise Unsloth for shrinking the 1T Kimi K2.7 Code model to 325GB to enable local inference while negative users object to the prohibitive VRAM costs and hardware barriers that remain.

Pos

54.4%

Neg

45.6%

59 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

VIEWS8.1KREPLIES12

prayag sonar@prayag_sonar

@UnslothAI tips to run it on 12gb vram?

13h8.1K97

BOOKMARKS2LIKES263RETWEETS2

Dan C.@count_slopula

@UnslothAI Sweet now I just need another 312gb VRAM!

13h7K2632

Jonathan 🇺🇲@thaonlyjonathan

@prayag_sonar @UnslothAI The middle out compression algorithm by Pied Piper.

8h1K231

DawidDD@dawiddrzala

@UnslothAI Love to see run it locally, only 330 gb vram needed 😀

12h2.7K25

Layton Gott@Layton_Gott

@UnslothAI Sweet buying a rack of Mac Studios with that much ram combined only cost $100k!

7h2K181

Rexx DZN@RexxDzn

@prayag_sonar @UnslothAI Hopes and prayers

12h1.3K16

Rayhan Athaillah@THYluhkoyd

@UnslothAI Nice. So I just need to buy 11 RTX 5090s?

10h4.1K8

Tino Thamjarat@V_TNO

@UnslothAI This would actually fit in a single Mac Studio M3 with 512Gb ram.

Unfortunately the option is not there anymore.

13h2.6K4

🅱️aha@baha_jam

@UnslothAI @grok how much do I have to invest for the gear?

9h1.5K4

NadzAI@NadzuAI

@UnslothAI Local AI keeps leveling up.

13h91831

Derek Colley@DerekColley_

@UnslothAI

11h1881

Daniel Birker@DBirker78883

@UnslothAI The fact that we can run a model that large locally now is insane. I’m going all in on local ai.

4h1.8K2

Florian Herrengt@florianherrengt

@prayag_sonar @UnslothAI Use q0.01_x_s

11h1K5

RasputinKaiser@RasputinKaiser

@UnslothAI That'll run fantastic on my 3090Ti!

I'll just keep the fire extinguisher handy

8h2.5K

Lokesh Sparrow@lokesh_sparrow

@UnslothAI That's solid progress on the compression side. Still needs serious hardware, but local runs are getting more realistic for those who have it.

13h1.2K1

𝗕𝗮𝗻𝗱𝗶𝗻𝗼𝗽𝗹𝗮@bandinopla

@UnslothAI How long until it is made illegal to run models locally and people that do be treated as domestic terrorists?

14h1.5K7

Deus 𝕏 Tenebris@jowobritish

@UnslothAI can you shrink it to fit my 8GB RAM laptop?

13h1.1K7

The Awake Sheep 2nd Account@TheAwakeSheep2

@DBirker78883 @UnslothAI "Locally"... Sorry but I don't have 330Gb of VRAM locally.

4h955

Mia 🍋@Mialygosa

@UnslothAI Locally…

330 Ram/Vram setup

Lel

6h3206

Orthorexic Apopheniac@LittleNonsuch

@UnslothAI How is the IQ of these quants? I wish that answer was obvious because it’s frustrating to invest time setting these things up if they end up being stupid.

10h2.6K