/Tech8h ago

Nvidia releases Qwen3.6-27B-NVFP4, using its Blackwell-era 4-bit format to reduce memory requirements by 4x

The release supports high-throughput inference via vLLM integration

439057.6K

#335

Original post

Julien Chaumond@julien_c#335inTech

just dropped

https://huggingface.co/nvidia/Qwen3.6-27B-NVFP4

9:57 AM · Jun 30, 2026 · 149 Views

Sentiment

Positive users express excitement about trying NVIDIA's new FP4-quantized Qwen model on their RTX hardware while negative users note disappointment over missing features like tcgen05 on sm120.

Pos

50.0%

Neg

50.0%

3 comments with sentiment.

Cluster Engagement

Digg Deeper

No Digg Deeper questions have been answered for this story yet.

Related links

nvidia/Qwen3.6-27B-NVFP4 · Hugging Face

HUGGINGFACEVia

#335

Posts from X

Most Activity

VIEWS3.4KBOOKMARKS4REPLIES3

Brian Roemmele@BrianRoemmele

Boom!

Nvidia Drops Qwen3.6-27B-NVFP4 on Hugging Face

Hugging Face CTO Julien Chaumond announced today that Nvidia has released the Qwen3.6-27B-NVFP4 model, a highly efficient quantized version of Alibaba’s recently launched 27-billion-parameter Qwen3.6 series.

NVFP4 is Nvidia’s proprietary 4-bit floating-point format introduced with the Blackwell GPU architecture.

It delivers up to 4× memory reduction compared to FP16 while preserving strong accuracy, making large models far more practical on Blackwell-based systems like DGX Spark and RTX PRO 6000 cards.

New benchmarks show excellent throughput with tools such as vLLM, especially when combined with speculative decoding techniques like MTP.

The release gives developers immediate access to a powerful, coding-focused 27B model that runs efficiently on consumer and enterprise Blackwell GPUs.

It joins a growing ecosystem of Qwen3.6 quantizations and underscores Nvidia’s push to accelerate open AI model deployment on its platform.

Julien Chaumond@julien_c

just dropped

https://huggingface.co/nvidia/Qwen3.6-27B-NVFP4

4h3.4K154

LIKES16

Julien Chaumond@julien_c

Remember that recent llama.cpp builds can dispatch GEMMs directly to the Blackwell FP4 Tensor Cores.

Lotto@LottoLabs

Ohhh we need some Blackwell bros to run this!

Nvidia coming through with the goat quants

https://huggingface.co/nvidia/Qwen3.6-27B-NVFP4

3h2.8K161

RETWEETS1

Noé Flandre@NoeFlandre

@julien_c In case you don’t know, as I didn’t, NVFP4 is Nvidia’s floating point format. It was introduced with the Blackwell GPU architecture. It reduces memory by up to 4x vs FP16 while preserving accuracy

8h8053

Sebastian S. Cocioba🪄🌷@ATinyGreenCell

@julien_c Cant wait to try this on my http://tiiny.ai thingggg

4h3302

Raghuvaran_mr@RaghuModupalli

@julien_c Will it work for AMD ?? I guess no ??

7h8422

Julien Chaumond@julien_c

❤️❤️

matheus.@matheusbuild

peut on imaginer une vie sans hugging face 🤗 Il n’y a plus de retour en arrière

3h1.2K70

TrueStory@quantumleap68

@julien_c any dgx spark TPS numbers on it?

7h4552

P A@PA80339646

@julien_c Is there a NVFP8 blessed by Nvidia?

6h6721

Adam 🤗@AdamMolnarHF

@julien_c big if true

6h362

CBir@c__bir

@julien_c works with mtp as well?

7h346

Jakove_HR@JakoveHr

@RaghuModupalli @julien_c Not nvfp4 bro definitly Not its Even Blackwell only

7h51

Jakove_HR@JakoveHr

@RaghuModupalli @julien_c Get gguf i guess llamacpp

7h50

42loops@42loops

@NoeFlandre @julien_c Je crois que ma rtx 6000 pro va être super contente…

et si je capte bien je peux maintenant la remplacer par seulement une rtx 3090 ?

Si c’est ça c’est une putain de bonne nouvelle

7h37

Zack Angelo@zackangelo

@julien_c no tcgen05 on sm120 though :(

2h33

jon3k@jon3k

@RaghuModupalli @julien_c Just run an unsloth gguf at whatever quant you want.

3h15

Products Heart@productsheartco

@julien_c !!

QDA HUB@QDAHUB

@julien_c We’ve Launched! [QDA-HUB] is here to transform how businesses track and scale. Our flagship Shopify Performance Dashboard delivers premium analytics through our know made dashboard — giving companies clarity, speed, and measurable impact. link:- https://qda-hub.vercel.app/