Qwen 3.6 27B is great but I have found Gemma 4 31B much more reliable. It doesn't overthink, uses the right tools only when needed, and can run faster thanks to its superior MTP design. A larger model running faster than a smaller one, that's crazy!!
Positive users praise Gemma 4 31B for better structured outputs and reliability than Qwen 3.6 while negative users report preferring Qwen or finding Gemma inferior on their benchmarks.
Most Activity

@OrganicGPT If you are on Mac you can try using the optiq quants, they are much more accurate and faster usually - http://mlx-optiq.com

@OrganicGPT Interesting how perspectives differ, for pretty much the same reasons I prefer Qwen 3.6 over Gemma.

@OrganicGPT Mine is totally opposite. Gemma models (both QAT and normal versions) are entering into generating loop . Even if I stop thinking in LMS, the “wait, I need to do X…. Wait, I need to do Y” is constantly polluting context. Qwen 3.6 both 35 and 27 B are still gold to me!

@Youssofal_ I wonder why Qwen is more resilient to quantization given that both are dense. could have to do with DeltaNet?

@OrganicGPT I’m complete opposite, Gemma has been trash for me on every benchmark in comparison to qwen

@OrganicGPT I think that’s the case, @bnjmn_marie posts great benchmarks on this.

@OrganicGPT My bias: for agent runs I care less about benchmark rank and more about “does it stop thinking when a simple tool call is enough.” Google’s QAT/MTP push is interesting because it attacks the boring local constraint: memory + latency. If Gemma is calmer there, that’s a real edge.

@OrganicGPT Qwen is way more resilient to quantisation than Gemma.
I usually run INT8 or hybrid INT4-BF16

@Timur_Yessenov That's exactly the advantage of Gemma! I use it for agentic tasks and not having to overthink before any simple task is really important. Qwen isn't that efficient in comparison.

@OrganicGPT Using Gemma 4 with Ollama and Zed.

@OrganicGPT @grok Help me explain to him , qwen is a reasoning model , gemma4 is not reasoning model but has thinking mode ability.

Output is question of taste for these two family of models. However, I have simplified observations:
1. Dense vs MoE Qwen 3.6 27B is Dense, where all parameters activate Gemma 4 31B is MoE, where not all parameters activate
2. llama.cpp for Gemma 4 31B doesn't support MTP yet, or you use some other engine for running models

@Youssofal_ do you use the quantized versions? I'm talking about the full bf16 models, maybe after quantization the results change

@latent_node thanks, I'll try it on Mac Studio. my post was about RTX Pro 6000 with vLLM tho

@raccoon_builds @grok both are reasoning models.

@OrganicGPT I can't run Gemma 4 31B .. but, I can run some comparable MoE models like Qwen 3.6 35B A3B. It all comes down to your system constraints. I am VRAM constrained, but I have decent amount of RAM. BOOM = usable. It's not mind blowing speed, but very usable.

@SlimTradeyBaby You can't be serious, maybe you're not using the same 31B Gemma model?

@OrganicGPT 听起来很不错,我会在mac上试一试

@nonRealBrandon That's Qwen and DeepSeek V4 Pro

@OrganicGPT Qwen 3.6 27B generally edges out or clearly beats Gemma 4 31B in head-to-heads, especially where it counts for a lot of users (coding/agentic stuff).