Qwen 3.6 27B is great but I have found Gemma 4 31B much more reliable. It doesn't overthink, uses the right tools only when needed, and can run faster thanks to its superior MTP design. A larger model running faster than a smaller one, that's crazy!!
Positive users praise Gemma 4 31B for better structured outputs and reliability than Qwen 3.6 while negative users report preferring Qwen or finding Gemma inferior on their benchmarks.
No Digg Deeper questions have been answered for this story yet.
Most Activity

@OrganicGPT If you are on Mac you can try using the optiq quants, they are much more accurate and faster usually - http://mlx-optiq.com

@OrganicGPT Interesting how perspectives differ, for pretty much the same reasons I prefer Qwen 3.6 over Gemma.

@OrganicGPT Mine is totally opposite. Gemma models (both QAT and normal versions) are entering into generating loop . Even if I stop thinking in LMS, the “wait, I need to do X…. Wait, I need to do Y” is constantly polluting context. Qwen 3.6 both 35 and 27 B are still gold to me!

@Youssofal_ I wonder why Qwen is more resilient to quantization given that both are dense. could have to do with DeltaNet?

@OrganicGPT I’m complete opposite, Gemma has been trash for me on every benchmark in comparison to qwen

@OrganicGPT I think that’s the case, @bnjmn_marie posts great benchmarks on this.

@OrganicGPT My bias: for agent runs I care less about benchmark rank and more about “does it stop thinking when a simple tool call is enough.” Google’s QAT/MTP push is interesting because it attacks the boring local constraint: memory + latency. If Gemma is calmer there, that’s a real edge.

@OrganicGPT Qwen is way more resilient to quantisation than Gemma.
I usually run INT8 or hybrid INT4-BF16

@Timur_Yessenov That's exactly the advantage of Gemma! I use it for agentic tasks and not having to overthink before any simple task is really important. Qwen isn't that efficient in comparison.

@OrganicGPT Using Gemma 4 with Ollama and Zed.

@OrganicGPT @grok Help me explain to him , qwen is a reasoning model , gemma4 is not reasoning model but has thinking mode ability.

Output is question of taste for these two family of models. However, I have simplified observations:
1. Dense vs MoE Qwen 3.6 27B is Dense, where all parameters activate Gemma 4 31B is MoE, where not all parameters activate
2. llama.cpp for Gemma 4 31B doesn't support MTP yet, or you use some other engine for running models

@Youssofal_ do you use the quantized versions? I'm talking about the full bf16 models, maybe after quantization the results change

@latent_node thanks, I'll try it on Mac Studio. my post was about RTX Pro 6000 with vLLM tho

@raccoon_builds @grok both are reasoning models.

@OrganicGPT I can't run Gemma 4 31B .. but, I can run some comparable MoE models like Qwen 3.6 35B A3B. It all comes down to your system constraints. I am VRAM constrained, but I have decent amount of RAM. BOOM = usable. It's not mind blowing speed, but very usable.

@SlimTradeyBaby You can't be serious, maybe you're not using the same 31B Gemma model?

@OrganicGPT 听起来很不错,我会在mac上试一试

@nonRealBrandon That's Qwen and DeepSeek V4 Pro

@OrganicGPT Qwen 3.6 27B generally edges out or clearly beats Gemma 4 31B in head-to-heads, especially where it counts for a lot of users (coding/agentic stuff).