"GLM produces cleaner code" — ❌ Not supported
Pass@1 is 6 pp lower. More verification ≠ more correct.
"GLM verifies more" — ✅ Partially confirmed
But it's atomized differently. GLM fires one sql_execute per check. Opus batches the same checks into fewer dbt show --inline calls. Same coverage, different shape.
