Lisan al Gaib says GLM-5.2's PostTrainBench SOTA relies on 38% more evaluation probing than Opus 4.8 · Digg