1d ago

Antirez releases quantized DeepSeek V4 Flash model on Hugging Face

0

Antirez published a quantized DeepSeek V4 Flash model on Hugging Face under the repository antirez/deepseek-v4-gguf. The 80.8 GiB file uses IQ2_XXS and Q2_K quantization on the routed experts along with Q8_0, F16, and F32 formats for other layers. The resulting model runs inference on a single RTX Pro 6000 GPU at a size comparable to gpt-oss-120B. Community observers view the release as a test of retained knowledge in the quantized variant.

Original post

Already reasonably established that it preserves a lot of general capability, interesting to test this on *knowledge* against gpt-oss-120B, as they're actually close in on-disk size.

7:07 AM · May 16, 2026 View on X
Reposted by
Antirez releases quantized DeepSeek V4 Flash model on Hugging Face · Digg