ok, here's how GLM 5.2 performs on a bench it definitely didn't see, and where GLM 5.1 scored 0.0%. Closer to Opus 4.8 than Sonnet 4.6 I hope their confidence seems more credible now
GLM 5.1 scores zero btw there's no way they benchmaxed this thing directly we shall see how 5.2 performs. I'd be surprised if it landed below MiniMax











