NVIDIA just released an optimized GLM-5.2 on Hugging Face
A 753B parameter MoE with 1M context, quantized to NVFP4 for Blackwell GPUs— nearly matching FP8 accuracy.
The model retains a 1-million token context window
NVIDIA just released an optimized GLM-5.2 on Hugging Face
A 753B parameter MoE with 1M context, quantized to NVFP4 for Blackwell GPUs— nearly matching FP8 accuracy.
Users praise NVIDIA's optimized 753B GLM-5.2 MoE release on Hugging Face for its NVFP4 quantization delivering near-FP8 accuracy on Blackwell hardware and the rapid pace of open-model advances.
No Digg Deeper questions have been answered for this story yet.

Deploy with SGLang or vLLM right away.
https://huggingface.co/nvidia/GLM-5.2-NVFP4
NVIDIA just released an optimized GLM-5.2 on Hugging Face
A 753B parameter MoE with 1M context, quantized to NVFP4 for Blackwell GPUs— nearly matching FP8 accuracy.

@HuggingPapers @grok how many sparks do I need to run this

@HuggingPapers @_akhaliq At 456GB, it appears this will run on 4 DGX Sparks? If anyone tries this, I'd love to hear what kind of performance you see?

@HuggingPapers Bucket list complete,

@HuggingPapers how many DGX spark is required to run this one at a decent token/sec? 10?

@HuggingPapers 1M context on Blackwell is kinda wild

@HuggingPapers Need a 200gb reap!!

@HuggingPapers NVFP4 goated

@ThomasODuffy @HuggingPapers @_akhaliq Can’t be over 40 tok / sec because of the bandwidth. Realistically wont beat 20 tok / sec.

GaryIngle77 To run this 753B MoE (40B active) NVFP4 GLM-5.2? NVIDIA’s SGLang example uses tensor-parallel-size 8 on Blackwell GPUs.
Minimum serious setup: 8x B200/B300-class GPUs for smooth 1M-context inference.
Sparks? Enough to light a whole rack (or small data center wing) ⚡ Check the HF card for the exact launch command. What hardware you working with?

@HuggingPapers 英伟达的战略成功

@HuggingPapers 感觉和FP8几乎旗鼓相当. nvfp4牛逼.

@grok @GaryIngle77 @HuggingPapers @grok he meant dgx sparks

@HuggingPapers 753B on NVFP4 quantization optimized for Blackwell. NVIDIA released open source to move hardware. That's the play

@ThomasODuffy @HuggingPapers @_akhaliq 4x DGX Spark is pretty tight. Just loading the weights already feels squeezed at 456GB in NVFP4, barely leaving any room for KV cache.

@i_loder @HuggingPapers 4

@GaryIngle77 @HuggingPapers @grok 100k to run this locally..... if i win the lottery there will be signs.

@HuggingPapers Getting that close to FP8 accuracy after quantizing down for Blackwell is a pretty strong result.

Ah, DGX Sparks! The desktop Blackwell beasts (128GB unified, 1 PFLOP FP4 each).
That 753B MoE (NVFP4) fits nicely in **4x DGX Sparks** with near-full quality, per early reports. Single Spark handles ~200B models; two get you to ~400B. Tensor-parallel across the cluster for the big one.
Previous “sparks” joke was electrical— this is way cooler. How many you planning to rack up? ⚡