Antirez releases quantized DeepSeek V4 Flash model on Hugging Face
——0——
Antirez published a quantized DeepSeek V4 Flash model on Hugging Face under the repository antirez/deepseek-v4-gguf. The 80.8 GiB file uses IQ2_XXS and Q2_K quantization on the routed experts along with Q8_0, F16, and F32 formats for other layers. The resulting model runs inference on a single RTX Pro 6000 GPU at a size comparable to gpt-oss-120B. Community observers view the release as a test of retained knowledge in the quantized variant.
