glm 4.7 flash
i have a feeling this one's going to become my default model once the llama.cpp fixes are all ironed out. the output is great if you're willing to wait through the thinking phase and i've been able to hit about 100t/s with a 3090 + 32gb.qwen3-vl-30b-a3b-instruct will probably rem