The hype cycle has completely swung the other way
No, there is NO WAY anyone can use only Chinese models
You can't even do debugging with GLM 5.2 which doesn't have basic vision capabilities
Closed models are still required for serious work
The hype cycle has completely swung the other way
No, there is NO WAY anyone can use only Chinese models
You can't even do debugging with GLM 5.2 which doesn't have basic vision capabilities
Closed models are still required for serious work
Many users criticized GLM 5.2 for omitting vision capabilities, viewing it as a major step backward that blocks standalone and agent workflows, while some defended its strong text performance and expected quick multimodal fixes.
No Digg Deeper questions have been answered for this story yet.
Chinese open-source AI models don't work because they don't have any vision capabilities and can't "see images"
Unless this is fixed, they can't really be used for serious work on a stand alone basis
So yeah, we are still dependent on closed source 🤷
GLM 5.2 doesn't quite work in the real world because it doesn't have any vision capabilities and can't "see images"
Unless this is fixed, it can't really be used for serious work on a stand alone basis
The other chinese models aren't good enough... So yeah, we are still dependent on closed source 🤷
GLM 5.2 don't quite work in the real world because it doesn't have any vision capabilities and can't "see images"
Unless this is fixed, it can't really be used for serious work on a stand alone basis
The other chinese models aren't good enough... So yeah, we are still dependent on closed source 🤷

@bindureddy Can't a smaller multimodal model be used to describe images to glm5.2 or too slow or expensive combo? Some Qwen VL?

@bindureddy Just have a vision model translate to text and feed output to language model.

@bindureddy yeah, but this is for sure a fixable problem no?

@claudeultramax Not the same as having native vision capabilities

@bindureddy Why is that so? 5.1 had vision if I'm right, I fed it ss a lot and it understood the context and nuance... Haven't tried 5.2 yet

@bindureddy Gemma4 multimodal is open model, n pretty good at vision

@bindureddy

@krisshkodrani yes, not ideal

@bindureddy Counterpoint, most serious work doesn't need vision at all, text reasoning alone covers 80% of real use cases..

@ku_ds17868 yes, but it will take a couple of months...

@bindureddy I think next they will do that anyhave we glad that we had the model atleast for text generation.

@bindureddy Missing vision capabilities is a massive step backward for an iteration. Serious enterprise pipelines rely heavily on multimodal workflows now. Stripping that out basically kills its standalone utility

@bindureddy that makes total sense, vision is such a huge blocker.

@bindureddy It's my frontend model of choice with Codex for backend. I created a "design arena" with Hermes and beat Opus on blind testing. How do you do frontend work?

@bindureddy Interesting take

@bindureddy Tool that takes screenshot in bmp, serialize to json, compare diff to known. hehe I did that with C++ browser project. Works surprisingly well.

@bindureddy it is easy to add. Qwen 3.6 27B and 35B A3B have a really good vision capabilities.