14h ago

GM Metric Detects Mode Collapse in Language Models Better Than Perplexity

0
Original post

More interesting: top-p nucleus sweep on a small AR LM. PPL drops monotonically as you tighten p. GM is U-shaped, with a minimum near p=0.90. As p drops below that, mode-collapse pushes GM back up. GM can of course still be gamed, but it catches more.

6:49 AM · May 16, 2026 View on X

More interesting: top-p nucleus sweep on a small AR LM. PPL drops monotonically as you tighten p.

GM is U-shaped, with a minimum near p=0.90. As p drops below that, mode-collapse pushes GM back up.

GM can of course still be gamed, but it catches more.

Emiel HoogeboomEmiel Hoogeboom@emiel_hoogeboom

Sanity check: take real OWT, tile one row across the batch (extreme repetition). PPL is almost the same: 14.0 vs 14.5 for real data. Looks fine! GM jumps from ~0 to +7.0. Collapse caught. This cannot be caught by the typically used token entropy.

1:49 PM · May 16, 2026 · 3 Views
1:49 PM · May 16, 2026 · 3 Views

Less extreme: top-p nucleus sweep on a small AR LM. PPL drops monotonically as you tighten p

GM is U-shaped, with a minimum near p=0.90. As p drops below that, mode-collapse pushes GM back up.

Note that GM can still be gamed, it's just more difficult.

Emiel HoogeboomEmiel Hoogeboom@emiel_hoogeboom

Sanity check: take real OWT, tile one row across the batch (extreme repetition). PPL barely budges: 14.0 vs 14.5 for real data. Looks fine! GM jumps from ~0 to +7.0. Collapse caught.

1:57 PM · May 16, 2026 · 446 Views
1:57 PM · May 16, 2026 · 298 Views
GM Metric Detects Mode Collapse in Language Models Better Than Perplexity · Digg