Apple MLR researchers introduce Normalizing Trajectory Models
Apple MLR researchers introduced Normalizing Trajectory Models, a method that combines normalizing flows with trajectory modeling. The approach enables high-quality few-step image generation while preserving exact trajectory likelihood. Jiatao Gu, staff ML researcher at Apple MLR, posted the paper on Hugging Face. The work examines whether fast generative models can stay likelihood-based and offers an alternative to methods that sacrifice likelihood for speed.
Can fast generative models still be likelihood-based?
Excited to share our new work @Apple MLR --Normalizing Trajectory Models
a step toward high-quality few-step generation with exact trajectory likelihood, powered by normalizing flows.
Paper: https://huggingface.co/papers/2605.08078 [1/9]

Diffusion and flow-matching models typically generate through many small steps, where simple denoising transitions are a reasonable approximation.
But when we compress generation into only a few coarse steps, the reverse transitions become much more complex.
[2/9]

Can fast generative models still be likelihood-based? Excited to share our new work @Apple MLR --Normalizing Trajectory Models a step toward high-quality few-step generation with exact trajectory likelihood, powered by normalizing flows. Paper: https://huggingface.co/papers/2605.08078 [1/9]
Most fast generators rely on distillation, consistency training, or adversarial/distribution-level objectives.
They are great to produce strong samples, but often move away from the likelihood-based formulation that made diffusion and flow models principled & scalable.
[3/9]

Diffusion and flow-matching models typically generate through many small steps, where simple denoising transitions are a reasonable approximation. But when we compress generation into only a few coarse steps, the reverse transitions become much more complex. [2/9]
We take a different view: few-step generation can be trajectory density modeling.
Instead of only learning a fast sampler, we learn a likelihood-based model over the full generative trajectory, so fast generation remains tied to an explicit probabilistic objective.
[4/9]

Most fast generators rely on distillation, consistency training, or adversarial/distribution-level objectives. They are great to produce strong samples, but often move away from the likelihood-based formulation that made diffusion and flow models principled & scalable. [3/9]
Concretely, NTM models each transition p(xs|xt) with expressive conditional normalizing flows.
This gives the model more flexibility than simple Gaussian denoising, allowing complex few-step transitions while keeping exact likelihood tractable over the full trajectory.
[5/9]

We take a different view: few-step generation can be trajectory density modeling. Instead of only learning a fast sampler, we learn a likelihood-based model over the full generative trajectory, so fast generation remains tied to an explicit probabilistic objective. [4/9]
Architecturally, NTM combines shallow invertible autoregressive transporters with an image-level predictor.
This allows training from scratch or initialized from pretrained diffusion models, then refined into a likelihood-based trajectory model for fast generation.
[6/9]

Concretely, NTM models each transition p(xs|xt) with expressive conditional normalizing flows. This gives the model more flexibility than simple Gaussian denoising, allowing complex few-step transitions while keeping exact likelihood tractable over the full trajectory. [5/9]
Moreover, because NTM has exact trajectory likelihood, it provides direct and stable trajectory-score/denoising targets.
We can use them to train a lightweight denoiser, so sampling can bypass the costly shallow AR flow blocks while keeping NTM’s trajectory knowledge.
[7/9]

Architecturally, NTM combines shallow invertible autoregressive transporters with an image-level predictor. This allows training from scratch or initialized from pretrained diffusion models, then refined into a likelihood-based trajectory model for fast generation. [6/9]
Empirically, NTM shows that likelihood training and fast generation can coexist: strong few-step text-to-image results, from-scratch or initialized from pretrained flow-matching models, plus a self-distilled 4-step denoiser derived from NTM’s stable trajectory targets.
[8/9]

Moreover, because NTM has exact trajectory likelihood, it provides direct and stable trajectory-score/denoising targets. We can use them to train a lightweight denoiser, so sampling can bypass the costly shallow AR flow blocks while keeping NTM’s trajectory knowledge. [7/9]