Unsloth AI co-founder Daniel Han runs a 1-bit quantized GLM-5.2 GGUF locally on a Mac Studio at 21.6 tokens per second

Unsloth AI co-founder Daniel Han runs a 1-bit quantized GLM-5.2 GGUF locally on a Mac Studio at 21.6 tokens per second · Digg

Posts from X

Most Activity

VIEWS7.7KBOOKMARKS12LIKES52RETWEETS10REPLIES4

Brian Roemmele@BrianRoemmele

Open source wins again!

And this is just 1-bit distilled.

Unsloth AI@UnslothAI

1-bit GLM-5.2 GGUF vs. Claude 4.8 Opus vs. GPT-5.5

We gave 3 models the same prompt and compared one-shot outputs.

The 1-bit GLM-5.2 GGUF ran locally on a Mac Studio M3 Ultra with 256GB RAM at ~21.6 tok/s.

Which output do you like best? GGUF: https://huggingface.co/unsloth/GLM-5.2-GGUF

2h7.7K5212

Kakashi@Meme_God_069

@UnslothAI You have earned this meme.

5h1.4K332

Aaryan Kakad@aaryan_kakad

@UnslothAI @danielhanchen glm-5.2 design taste is very insane.

i just gave it a one liner prompt, just to test it.

and it left me in awe.

check this out:

5h49562

Keye | 科爷的数字生命@keyelifeai

@UnslothAI I decided to buy a Mac Studio this week.

5h1.9K3

mr-r0b0t@mr_r0b0t

@UnslothAI this is surprisingly dope! never thought I'd be saying that about a 1-bit!

4h82811

Yad0@Micr0be

@UnslothAI This is insane, can you go below 1bit? Is that possible?

5h2.3K5

Hikari∣LocalLLM⚡@Hikari_07_jp

@UnslothAI Compressed down to 1 bit and it's this good??? That's seriously amazing!! I was using 2 bits for RAM offload inference because I was afraid of degradation, but I'll try 1 bit too. Thanks unsloth🫶

5h7145

aleks90386@aleks90386

@UnslothAI The prompt used here was originally shared on r/LocalLLaMA https://www.reddit.com/r/LocalLLaMA/comments/1styxdy/compared_qwen_36_35b_with_qwen_36_27b_for_coding/

5h57511

witcheer@witcheer

@UnslothAI so 1-bit GLM 5.2 does better than others? interesting

5h4006

Professionally Clueless@NoClueOnPurpose

@UnslothAI Challnege: Make GLM 5.2 run on 24GB VRAM.

5h3382

JMB 🧙‍♂️@jmbollenbacher

@UnslothAI @antirez Seems like glm-5.2 quantizes pretty well.

Not sure it's as good as deepseek but it's at least not totally broken at super low quants.

4h3701

L₳theesan@LatheesanK

@UnslothAI that's some voodoo, like are we for real? 1-Bit quantisation?!!

5h7484

hadesdevs@hadesdevs

@UnslothAI I have to say, glm looks way better here

5h1.5K3

𝗠𝗼𝗵𝗮𝗺𝗲𝗱@AIVibecode

Honestly, this is wild.

A 744B model squeezed down to 217GB and still beating Opus at a creative coding task. The video says more than any benchmark here. The GLM output feels richer, the physics are better, and there is clearly more thought behind the result. Unsloth earned the flex.

That said, creative coding is where 1-bit models tend to look their best. Give it a tough math problem and the conversation changes quickly. The interesting part is not that 1-bit beats everything. It is that quantization keeps getting better, and the gap between what is practical and what is not keeps shrinking.

A Mac Studio is expensive, sure, but it is still a lot cheaper than a data center.

4h2684

Rambone@vinrambone

@UnslothAI Damn we are down to a single bit now haha. Doesnt seem worth the trade off but a cool experiment.

4h5263

Quipra@Quipra_

@UnslothAI That is impressive. Try that with different bits of GLM 5.2. Let's see how it looks.

3h4863

❖Prisma Dimensional❖@PrismaDimens

@Micr0be @UnslothAI Yes

5h2943

Proxiee99@yash_raj99

@UnslothAI What the helly, how the hell are you compressing these

4h8842

The one@waltonoemi

@UnslothAI Would you recommend unsloth studio as a headless mac model server?

5h4182

Neekhil Vatsa@garfieldII

@UnslothAI Vertical expansion for @UnslothAI . Ship the quantized versions with refurbished hardware. Clean and optimized the way only this team can.

I'll buy, considering how unlikely it is now for someone to get hands at an affordable hardware of that scale.

5h1.3K1