3d ago

Flow Model Evaluation Standards Overstate Progress, Researcher Finds

31732815815.9K

——0——

Original post

#80@SEDIELEM @SAM_ACQUA

Sam Acquaviva@SAM_ACQUA

Flow models are a promising alternative to autoregression. But the current standard of evaluating flow models is broken. The reported 3x improvement in 1024-step PPL since 2023 is closer to 1.1x if you control for sample entropy. (1/12)

8:24 AM · May 13, 2026

Cluster engagement

68 snapshots

Reposted by

#80@SEDIELEM

QUOTE POST

#80Sander Dieleman@SEDIELEM

I'm going to re-retweet this, because it's important! There are a few pitfalls when evaluating diffusion language models, highlighted in these two recent blog posts: - https://patrickpynadath1.github.io/blog/eval_methodology/ - https://samacquaviva.com/projects/flow-evals/

Both are worth a read if you have an interest in this space!

Sam Acquaviva@Sam_Acqua

3:24 PM · May 13, 2026 · 24.9K Views

5:57 PM · May 15, 2026 · 15.9K Views