glm 4.7 flash

i have a feeling this one's going to become my default model once the llama.cpp fixes are all ironed out. the output is great if you're willing to wait through the thinking phase and i've been able to hit about 100t/s with a 3090 + 32gb.

qwen3-vl-30b-a3b-instruct will probably remain on the back burner for vision tasks.

how has this model been for all of you? has anyone used tool calling with opencode or claude code?