GLM 5.2 has just been released 🔥
Here it's already running with MLX on two Mac Studios (M3 Ultra).
This is comparable to the latest closed models, with weights you can download, quantize, distill, fine-tune, run.
The open model matches leading closed-source system performance.
GLM 5.2 has just been released 🔥
Here it's already running with MLX on two Mac Studios (M3 Ultra).
This is comparable to the latest closed models, with weights you can download, quantize, distill, fine-tune, run.
Some users express excitement about GLM 5.2's open weights delivering competitive performance on M3 Ultra Macs, while others question the claims as unverified fluff and criticize the high hardware costs as poor value.
No Digg Deeper questions have been answered for this story yet.

@pcuenq Interconnect is the wall here. Real question: decode tok/s on the two-Studio split, and what quant fits unified memory so you skip the inter-box hop entirely? Single-box at lower precision likely beats split-box at higher. Got numbers?
GLM 5.2 has just been released 🔥
Here it's already running with MLX on two Mac Studios (M3 Ultra).
This is comparable to the latest closed models, with weights you can download, quantize, distill, fine-tune, run.

@pcuenq Man those m3 ultras are covered these days. When m5 ultra apple?

@ukrroot This is mxfp4, which actually fits in a single machine. I'll grab some numbers, but we don't have RDMA enabled so there will be room for improvement.

@pcuenq running this on two m3 ultras is just showing off

@pcuenq How fast is it with your setup?

@pcuenq GLM 5.2 already running on Mac Studios is insane. Open weights and competitive with closed models is the combo 🔥

@pcuenq How are you testing it’s performance like are you basing it on real stuff or just fluff?

@pcuenq U r kidding, 14 weeks. 96Gb 11k$. For what? 30tk/s? Nah thanks

@pcuenq What is the color scheme or the theme?

@pcuenq Curious to know about concurrency and bandwidth ?

@pcuenq This is absolutely nuts

@pcuenq Check

@pcuenq But I wanna run a swarm of >250 agents at a time

@ukrroot @pcuenq Yes Mr Ultra Hero, we need to know please.

@pcuenq You may try something like TurboQuant, then applying it to the weights themselves

@ukrroot @pcuenq Yes exactly. They can’t share memory. How and why would i connect two?

@pcuenq still waiting for new studio macs