Hear me: People used to soyface about novel coding evals, where Chyna/open models were not just behind but garbage. GLM covered most of that gap. Now we look at combined metrics like ECI, or "pure reasoning" like ARC. I predict this, too, will prove to be surprisingly fragile.
"omg omg GLM-5.2 is beating fable. china is catching up"
chill out and listen to Lisan: > slightly ahead of Opus 4.5 > behind GPT-5.2, Gemini 3 Pro and Opus 4.6
