Since I'm between jobs, I've been having a lot of fun vibe-coding with public tooling.
First drop: a clean PyTorch impl of the Gradient Moment metric from our recent paper (arXiv:2603.20155).

The motivation: for models without a tractable likelihood (distilled discrete diffusion, in our case), generative PPL is easy to game by sampling at low entropy. You get "better" PPL by being repetitive.
GM uses the gradient of a reference LM's NLL instead.
Since I'm between jobs, I've been having a lot of fun vibe-coding with public tooling. First drop: a clean PyTorch impl of the Gradient Moment metric from our recent paper (arXiv:2603.20155). https://github.com/ehoogeboom/gradient-moment
Since I'm between jobs, I've been having a lot of fun vibe-coding with public tooling. First drop: a clean PyTorch impl of the Gradient Moment metric from our recent paper (arXiv:2603.20155). https://github.com/ehoogeboom/gradient-moment

The motivation: for models without a tractable likelihood (distilled discrete diffusion, in our case), generative PPL is easy to game by sampling at low entropy. You get "better" PPL by being more repetitive.
GM uses the gradient of a reference LM's NLL instead.
Since I'm between jobs, I've been having a lot of fun vibe-coding with public tooling. First drop: a clean PyTorch impl of the Gradient Moment metric from our recent paper (arXiv:2603.20155). https://github.com/ehoogeboom/gradient-moment