/Tech15h ago

Peyman Milanfar argues that latent-space models restrict generative AI progress, urging a shift to pixel-space architectures

Luca Ambrogioni questioned if pixel-space scales to high-resolution video

192011411122.1K

Original post

Latent-space models are a cage we’ve boxed ourselves into. The reason for using them in the first place was always efficiency, but we lost the plot and forgot that the speed costs us in terms of progress. It’s time to move on to pixel-space models for the next state of the art.

9:19 PM · Jun 24, 2026 · 21.9K Views

Sentiment

Positive users back the call to shift AI models from latent-space to pixel-space because latent space feels overly restrictive, while negative users defend latent space for robotics applications or dismiss the idea as nonsensical.

Pos

40.0%

Neg

60.0%

5 comments with sentiment.

Cluster Engagement

Digg Deeper

No Digg Deeper questions have been answered for this story yet.

Posts from X

Most Activity

VIEWS1.2K

Ulugbek S. Kamilov@prof_kamilov

@docmilanfar Yeah, the box is too tight

18h1.2K2

BOOKMARKS2

Allen Schmaltz@Allen_Schmaltz

@docmilanfar Here's a template for when the plot gets lost on what to plot next:

11h44712

LIKES8

Zhi-Song Liu_AIML@liuzhisong_cv

@docmilanfar It would be interesting to see how pixel-space model handle global climate data with resolution over 10k.

17h1K8

REPLIES1

Mohammad Mahdi Azizi@gebersys

@docmilanfar I partially disagree: 1. There is no natural, universal pixel-space distance. And probably there never will be. 2. The problem isn't the latent space itself; it's the failure to consider the predictable and unpredictable parts of images (data). The problem lies with the decoders.

16h42

Subir Varma@subirv

@docmilanfar Why are pixel space models preferable?

17h7375

Nir Zabari‎@nirzabari

@docmilanfar Frozen VAEs may disappear. Compressed representations won’t. Some lower-dimensional latent will probably remain at least until the next major compute/modeling breakthrough

16h4962

wontfix@DadMakingGames

@docmilanfar instead of abandoning latent space, should we focus on better, dynamically scaling latent topologies? I agree in relation to VAEs

16h7141

Luca Ambrogioni@LucaAmb

@docmilanfar Even for high-res videos?

15h3191

Everett Kleven@everettkleven

@docmilanfar Atoms space is pretty cool

15h3141

Maxime Alvarez@qu3tzalify

@docmilanfar Meanwhile robotics and world model applications are suffering from pixel space issues, latent space is the way to go

15h3071

Kuldeep Kulkarni@KK_Ilkal

@docmilanfar Totally agree!

18h775

Felix Goldberg 🟠@FelixGoldberg1

@docmilanfar That would be the opposite call from the one Lecun is making, right? Or did I miss something?

14h275

Utilitarian Princess@UtilityPrincess

@docmilanfar Am I missing something? Pixel-space seems like a sparse representation of the world just like a lot of latent spaces

7h140

W. Thomas Payne@w_t_payne

@docmilanfar What? This doesn't make any sense.

13h140

Meysam Zare@RegretfulSam

@DadMakingGames @docmilanfar it's not the topology that matters, it's the geometry you

8h52

Tomás Flood@tomasrflood

@docmilanfar Lol

12h57

Marc@MarcJSchmidt

@subirv @docmilanfar they are not. latent could have many forms, perhaps OP is talking about a very specific one that they claim is hard to train/scale.

15h43

Mohammad Mahdi Azizi@gebersys

@docmilanfar In my opinion, decoders should decode using generative priors. We should revisit VAEs, but with the tools achieved in the field of conditional generation.

16h91

Saul Buitrago@sbuitrago

@liuzhisong_cv @docmilanfar Maybe we don't need to train the whole domain but the "local" physics and deploy multiple agents for solving the total domain. At the end the physics are equivariant in space and time.

13h14

Taro Bushidō@techietaro

@docmilanfar Agree on the stagnation. Latent-space traded interpretability for efficiency. Pixel-space could prioritize feature predictability over realism, but compute remains the bottleneck.

18h2