1d ago

Gradient Matching Metric Detects Repetition Collapse Missed by Perplexity

0
Original post

Sanity check: take real OWT, tile one row across the batch (extreme repetition). PPL is almost the same: 14.0 vs 14.5 for real data. Looks fine! GM jumps from ~0 to +7.0. Collapse caught. This cannot be caught by the typically used token entropy.

6:49 AM · May 16, 2026 View on X

Sanity check: take real OWT, tile one row across the batch (extreme repetition). PPL is almost the same: 14.0 vs 14.5 for real data. Looks fine!

GM jumps from ~0 to +7.0. Collapse caught.

This cannot be caught by the typically used token entropy.

Emiel HoogeboomEmiel Hoogeboom@emiel_hoogeboom

The motivation: for models without a tractable likelihood (distilled discrete diffusion, in our case), generative PPL is easy to game by sampling at low entropy. You get "better" PPL by being more repetitive. GM uses the gradient of a reference LM's NLL instead.

1:49 PM · May 16, 2026 · 1 Views
1:49 PM · May 16, 2026 · 3 Views

Sanity check: take real OWT, tile one row across the batch (extreme repetition). PPL barely budges: 14.0 vs 14.5 for real data. Looks fine!

GM jumps from ~0 to +7.0. Collapse caught.

Emiel HoogeboomEmiel Hoogeboom@emiel_hoogeboom

The motivation: for models without a tractable likelihood (distilled discrete diffusion, in our case), generative PPL is easy to game by sampling at low entropy. You get "better" PPL by being repetitive. GM uses the gradient of a reference LM's NLL instead.

1:57 PM · May 16, 2026 · 485 Views
1:57 PM · May 16, 2026 · 550 Views
Gradient Matching Metric Detects Repetition Collapse Missed by Perplexity · Digg