4) Qwen 3.7 Max is a very strong full-coverage run, but less hygienic than Fable or GPT‑5.5 xhigh.
It covers 14/14, gets accepted solved/settled results on 1, 4, 5, and 7, and has good accepted partials on 2, 8, 11, 195, and 208. It is probably the strongest “broad coverage” new run after Kimi K2.7, especially because it does not have rejected solved claims.
The issue is overstatement: it misses the Problem 3 Green-Tao application and uses answer-like language on Problems 9, 10, and 12, where the proof gaps remain material.
So the judgment is: high mathematical competence, high coverage, but weaker proof hygiene.