3d ago

Flow Model Evaluation Standards Overstate Progress, Researcher Finds

0
Original post

Flow models are a promising alternative to autoregression. But the current standard of evaluating flow models is broken. The reported 3x improvement in 1024-step PPL since 2023 is closer to 1.1x if you control for sample entropy. (1/12)

8:24 AM · May 13, 2026 View on X
Reposted by

I'm going to re-retweet this, because it's important! There are a few pitfalls when evaluating diffusion language models, highlighted in these two recent blog posts: - https://patrickpynadath1.github.io/blog/eval_methodology/ - https://samacquaviva.com/projects/flow-evals/

Both are worth a read if you have an interest in this space!

Sam AcquavivaSam Acquaviva@Sam_Acqua

Flow models are a promising alternative to autoregression. But the current standard of evaluating flow models is broken. The reported 3x improvement in 1024-step PPL since 2023 is closer to 1.1x if you control for sample entropy. (1/12)

3:24 PM · May 13, 2026 · 24.9K Views
5:57 PM · May 15, 2026 · 15.9K Views
Flow Model Evaluation Standards Overstate Progress, Researcher Finds · Digg