1d ago

Apple MLR researchers introduce Normalizing Trajectory Models

82184113234.4K

——0——

Apple MLR researchers introduced Normalizing Trajectory Models, a method that combines normalizing flows with trajectory modeling. The approach enables high-quality few-step image generation while preserving exact trajectory likelihood. Jiatao Gu, staff ML researcher at Apple MLR, posted the paper on Hugging Face. The work examines whether fast generative models can stay likelihood-based and offers an alternative to methods that sacrifice likelihood for speed.

Original post

Jiatao Gu#733@THOMA_GU

Can fast generative models still be likelihood-based? Excited to share our new work @Apple MLR --Normalizing Trajectory Models a step toward high-quality few-step generation with exact trajectory likelihood, powered by normalizing flows. Paper: https://huggingface.co/papers/2605.08078 [1/9]

2:13 PM · May 15, 2026

Cluster engagement

144 snapshots

Reposted by

#733@THOMA_GU

#30@_AKHALIQ

ORIGINAL POST

#733Jiatao Gu@THOMA_GU

Can fast generative models still be likelihood-based?

Excited to share our new work @Apple MLR --Normalizing Trajectory Models

a step toward high-quality few-step generation with exact trajectory likelihood, powered by normalizing flows.

Paper: https://huggingface.co/papers/2605.08078 [1/9]

9:13 PM · May 15, 2026 · 25.8K Views

#733Jiatao Gu@THOMA_GU

Diffusion and flow-matching models typically generate through many small steps, where simple denoising transitions are a reasonable approximation.

But when we compress generation into only a few coarse steps, the reverse transitions become much more complex.

[2/9]

Jiatao Gu@thoma_gu

9:13 PM · May 15, 2026 · 25.8K Views

9:13 PM · May 15, 2026 · 664 Views

#733Jiatao Gu@THOMA_GU

Most fast generators rely on distillation, consistency training, or adversarial/distribution-level objectives.

They are great to produce strong samples, but often move away from the likelihood-based formulation that made diffusion and flow models principled & scalable.

[3/9]

Jiatao Gu@thoma_gu

Diffusion and flow-matching models typically generate through many small steps, where simple denoising transitions are a reasonable approximation. But when we compress generation into only a few coarse steps, the reverse transitions become much more complex. [2/9]

9:13 PM · May 15, 2026 · 664 Views

9:13 PM · May 15, 2026 · 537 Views

#733Jiatao Gu@THOMA_GU

We take a different view: few-step generation can be trajectory density modeling.

Instead of only learning a fast sampler, we learn a likelihood-based model over the full generative trajectory, so fast generation remains tied to an explicit probabilistic objective.

[4/9]

Jiatao Gu@thoma_gu

Most fast generators rely on distillation, consistency training, or adversarial/distribution-level objectives. They are great to produce strong samples, but often move away from the likelihood-based formulation that made diffusion and flow models principled & scalable. [3/9]

9:13 PM · May 15, 2026 · 537 Views

9:13 PM · May 15, 2026 · 376 Views

#733Jiatao Gu@THOMA_GU

Concretely, NTM models each transition p(xs|xt) with expressive conditional normalizing flows.

This gives the model more flexibility than simple Gaussian denoising, allowing complex few-step transitions while keeping exact likelihood tractable over the full trajectory.

[5/9]

Jiatao Gu@thoma_gu

We take a different view: few-step generation can be trajectory density modeling. Instead of only learning a fast sampler, we learn a likelihood-based model over the full generative trajectory, so fast generation remains tied to an explicit probabilistic objective. [4/9]

9:13 PM · May 15, 2026 · 376 Views

9:13 PM · May 15, 2026 · 351 Views

#733Jiatao Gu@THOMA_GU

Architecturally, NTM combines shallow invertible autoregressive transporters with an image-level predictor.

This allows training from scratch or initialized from pretrained diffusion models, then refined into a likelihood-based trajectory model for fast generation.

[6/9]

Jiatao Gu@thoma_gu

Concretely, NTM models each transition p(xs|xt) with expressive conditional normalizing flows. This gives the model more flexibility than simple Gaussian denoising, allowing complex few-step transitions while keeping exact likelihood tractable over the full trajectory. [5/9]

9:13 PM · May 15, 2026 · 351 Views

9:13 PM · May 15, 2026 · 342 Views

#733Jiatao Gu@THOMA_GU

Moreover, because NTM has exact trajectory likelihood, it provides direct and stable trajectory-score/denoising targets.

We can use them to train a lightweight denoiser, so sampling can bypass the costly shallow AR flow blocks while keeping NTM’s trajectory knowledge.

[7/9]

Jiatao Gu@thoma_gu

Architecturally, NTM combines shallow invertible autoregressive transporters with an image-level predictor. This allows training from scratch or initialized from pretrained diffusion models, then refined into a likelihood-based trajectory model for fast generation. [6/9]

9:13 PM · May 15, 2026 · 342 Views

9:13 PM · May 15, 2026 · 310 Views

#733Jiatao Gu@THOMA_GU

Empirically, NTM shows that likelihood training and fast generation can coexist: strong few-step text-to-image results, from-scratch or initialized from pretrained flow-matching models, plus a self-distilled 4-step denoiser derived from NTM’s stable trajectory targets.

[8/9]

Jiatao Gu@thoma_gu

Moreover, because NTM has exact trajectory likelihood, it provides direct and stable trajectory-score/denoising targets. We can use them to train a lightweight denoiser, so sampling can bypass the costly shallow AR flow blocks while keeping NTM’s trajectory knowledge. [7/9]

9:13 PM · May 15, 2026 · 310 Views

9:13 PM · May 15, 2026 · 485 Views