Nevermind lmao, Flash can do it too
There are tasks on which Chinese models are 2 years behind. Sonnet 3.5 could do this *without* thinking. Many frontier models today can. GLM takes 20 thousand tokens to ALMOST get it exactly right, from first principles, fumbling in its CoT. Still, it does get it.

