14h ago

GM Metric Detects Mode Collapse in Language Models Better Than Perplexity

0310304

——0——

Original post

Emiel Hoogeboom#1938@EMIEL_HOOGEBOOM

More interesting: top-p nucleus sweep on a small AR LM. PPL drops monotonically as you tighten p. GM is U-shaped, with a minimum near p=0.90. As p drops below that, mode-collapse pushes GM back up. GM can of course still be gamed, but it catches more.

6:49 AM · May 16, 2026

Cluster engagement

89 snapshots

#1938Emiel Hoogeboom@EMIEL_HOOGEBOOM

More interesting: top-p nucleus sweep on a small AR LM. PPL drops monotonically as you tighten p.

GM is U-shaped, with a minimum near p=0.90. As p drops below that, mode-collapse pushes GM back up.

GM can of course still be gamed, but it catches more.

Emiel Hoogeboom@emiel_hoogeboom

Sanity check: take real OWT, tile one row across the batch (extreme repetition). PPL is almost the same: 14.0 vs 14.5 for real data. Looks fine! GM jumps from ~0 to +7.0. Collapse caught. This cannot be caught by the typically used token entropy.

1:49 PM · May 16, 2026 · 3 Views

#1938Emiel Hoogeboom@EMIEL_HOOGEBOOM

Less extreme: top-p nucleus sweep on a small AR LM. PPL drops monotonically as you tighten p

GM is U-shaped, with a minimum near p=0.90. As p drops below that, mode-collapse pushes GM back up.

Note that GM can still be gamed, it's just more difficult.

Emiel Hoogeboom@emiel_hoogeboom

Sanity check: take real OWT, tile one row across the batch (extreme repetition). PPL barely budges: 14.0 vs 14.5 for real data. Looks fine! GM jumps from ~0 to +7.0. Collapse caught.

1:57 PM · May 16, 2026 · 446 Views

1:57 PM · May 16, 2026 · 298 Views