just dropped
https://huggingface.co/nvidia/Qwen3.6-27B-NVFP4
The release supports high-throughput inference via vLLM integration
just dropped
https://huggingface.co/nvidia/Qwen3.6-27B-NVFP4
Positive users express excitement about trying NVIDIA's new FP4-quantized Qwen model on their RTX hardware while negative users note disappointment over missing features like tcgen05 on sm120.
No Digg Deeper questions have been answered for this story yet.
Boom!
Nvidia Drops Qwen3.6-27B-NVFP4 on Hugging Face
Hugging Face CTO Julien Chaumond announced today that Nvidia has released the Qwen3.6-27B-NVFP4 model, a highly efficient quantized version of Alibaba’s recently launched 27-billion-parameter Qwen3.6 series.
NVFP4 is Nvidia’s proprietary 4-bit floating-point format introduced with the Blackwell GPU architecture.
It delivers up to 4× memory reduction compared to FP16 while preserving strong accuracy, making large models far more practical on Blackwell-based systems like DGX Spark and RTX PRO 6000 cards.
New benchmarks show excellent throughput with tools such as vLLM, especially when combined with speculative decoding techniques like MTP.
The release gives developers immediate access to a powerful, coding-focused 27B model that runs efficiently on consumer and enterprise Blackwell GPUs.
It joins a growing ecosystem of Qwen3.6 quantizations and underscores Nvidia’s push to accelerate open AI model deployment on its platform.
just dropped
https://huggingface.co/nvidia/Qwen3.6-27B-NVFP4
Remember that recent llama.cpp builds can dispatch GEMMs directly to the Blackwell FP4 Tensor Cores.
Ohhh we need some Blackwell bros to run this!
Nvidia coming through with the goat quants
https://huggingface.co/nvidia/Qwen3.6-27B-NVFP4

@julien_c In case you don’t know, as I didn’t, NVFP4 is Nvidia’s floating point format. It was introduced with the Blackwell GPU architecture. It reduces memory by up to 4x vs FP16 while preserving accuracy

@julien_c Cant wait to try this on my http://tiiny.ai thingggg

@julien_c Will it work for AMD ?? I guess no ??
❤️❤️
peut on imaginer une vie sans hugging face 🤗 Il n’y a plus de retour en arrière

@julien_c any dgx spark TPS numbers on it?

@julien_c Is there a NVFP8 blessed by Nvidia?

@julien_c big if true

@julien_c works with mtp as well?

@RaghuModupalli @julien_c Not nvfp4 bro definitly Not its Even Blackwell only

@RaghuModupalli @julien_c Get gguf i guess llamacpp

@NoeFlandre @julien_c Je crois que ma rtx 6000 pro va être super contente…
et si je capte bien je peux maintenant la remplacer par seulement une rtx 3090 ?
Si c’est ça c’est une putain de bonne nouvelle

@julien_c no tcgen05 on sm120 though :(

@RaghuModupalli @julien_c Just run an unsloth gguf at whatever quant you want.

@julien_c !!

@julien_c We’ve Launched! [QDA-HUB] is here to transform how businesses track and scale. Our flagship Shopify Performance Dashboard delivers premium analytics through our know made dashboard — giving companies clarity, speed, and measurable impact. link:- https://qda-hub.vercel.app/