Good new hard benchmark to monitor, and one where the US frontier is solidly ahead (yes even Gemini, though I guess its STEM/deep multimodal priors make up for sheer stupidity. hopefully we'll see GLM-5.3V) @scaling01 add to the roster
Surface Evolver Bench
This actually looks like a pretty cool benchmark!





