To be clear, this is not a V-JEPA or VideoMAE diss, just resurrecting the fact that "pure videogen" models may indeed learn an explicit model of the world/physics as a byproduct.
Also cc @mapo1 we chatted about this and you also intuitively pushed back against such claim.
The paper is "The invisible hand of physics" from a surprisingly diverse set of authors (Parsa Esmati, @Somjit77): https://arxiv.org/abs/2606.05328 ; It's from just a few days ago. I learned about it from a nice talk by @katjahofmann today.
The paper from earlier in the year is by @soniajoseph_ etal: https://arxiv.org/abs/2602.07050


