The most capable Chinese model tested on ARC-AGI-2 is Kimi K2.5, released on January 27, 2026, with a score of 11.8%. I think GLM 5.2 ought to score at least 50%. This is getting a bit silly.
@teortaxesTex Pretty much the only benchmark I care about when looking at these open weight models is Arc-AGI 2.
GLM 5 scored 4.9%. Still no GLM 5.1 or 5.2 score.






