"GLM takes more turns" — ✅ Confirmed
99 turns avg vs 80 for Opus. 40 vs 29 execution-style calls/trial. This is real.
We ran 103 dbt tasks × 3 trials on both GLM-5.2 and Opus-4.7.
Pass@3: 66% vs 67% — tied. Pass@1: 47.6% vs 53.7% — Opus wins by 6 pp.
GLM is noisier per-trial, but broad enough at k=3 to stay competitive.
