Respectfully, I reject @ilyasut asserting.
This represents “understanding” if and only if one accepts a particularly sparse meaning of the word.
Understanding as is used in sentient systems involves not just compression but also abductive reasoning, theory-building, and the ability to explain its context, its consequences, and its limits.
To understand something is not just a matter of regurgitating synonyms for what that thing is but rather the ability to assert what that thing is not and why that is so.
A model trained for next-token prediction is forced to build compressed representations of latent structure in text. Ilya Sutskever correctly refers to this phenomenon as understanding. Here, a model trained for next-step sensor prediction, with a robot that has proprioception and touch sensors but no vision, is forced to build compressed representations of latent structure in the physical world. The robot becomes aware of the shape of external objects. That is, it understands the physical properties of the external world that enable it to make better next-step sensor predictions.
This research was previously done by a diverse team of expert engineers at DeepMind over a month - including stars like @notmisha and @yuvaltassa. Remarkably, this reproduction with a completely different robot took only a few hours to implement using Codex.
The automatic creation of physical environments by AI will likely lead to huge advances in areas of science and engineering that use physical simulators or twin models.
The paper and notebook are available at ❤️∀ https://love4all.ai/