Entropix creator xjdr says GLM 5.2 identified complex C++ and Rust bugs that GPT-5.5 xhigh repeatedly missed
xjdr noted GLM 5.2 is not overall superior.
Users are excited about GLM 5.2 detecting complex bugs in C++ and Rust missed by GPT-5.5, with some planning to integrate it into workflows for refactors and reviews.
No Digg Deeper questions have been answered for this story yet.
Most Activity
diversity is our strength
glm 5.2 has now, on more than one occasion, found complex and detailed bugs in C++ and Rust that gpt5.5 xhigh has missed repeatedly . im not saying its better, but its closer than i would have thought and also different in mostly positive ways

@_xjdr I still find myself going to Claude or Codex briefly for like code reviews or obviously vision tasks, but I did implement a rough /code-review into my NCode as well. GLM is very nice.

@_xjdr @patience_cave how does acceleration make you feel?? 😎😎

@_xjdr Were those directed bug queries or self-discovered in adjacent tasks?

@_xjdr Perhaps they trained heavily on systems languages and systems programming tasks instead of SWE Bench Pro (Verified)++ Python and TypeScript slop? Might have to check it out.

@_xjdr I'll be giving it a try at the end of the month. Currently using 5.1 to do major refactors and it does well. I have some ambitious rust and c++ projects. About time a model is good in rust!

@_xjdr It’s becoming more obvious to me that cursor/SpaceX has a legitimate shot at producing the best model still
whereas a couple months ago, I thought that ship sailed

@_xjdr production rust code is way messier than textbook examples. glm probably has more real-world deployments in its training data.
exactly why you can't just pick a model by reputation. it's always problem-specific.

@_xjdr I don't ever see a reason to not have more than one model take a pass at things because of things like this, shit even gemini gets wins in sometimes.

@_xjdr The interesting word there is 'different' more than 'closer'. Model diversity is starting to matter like test diversity: a second system that fails differently can beat a stronger default in real debugging loops.

@_xjdr model monoculture is risky. what kind of bugs is GLM catching that 5.5 misses?