
For these tests, I used the same 1000 prompts I did in the Qwen3.5 4B comparison. I additionally used local 6-bit quants of both Qwen3.6 35B A3B and Gemma 4 26B A4B, w/ thinking off - this is to capture 'local, low-latency chat' - the same regime GPT-4o excelled at.


