Fei-Fei Li warns that AI may be staring too hard at language models. The world is not just text on a screen. It is physical, visual, spatial, and always changing. Most of the economy runs on seeing, moving, interacting, and embodied intelligence.
Fei-Fei Li warns AI neglects physical and embodied world
Fei-Fei Li cautioned that AI development overemphasizes language models while neglecting physical, visual, spatial, and dynamic aspects of the real world. She noted that most economic activity depends on embodied intelligence through seeing, moving, and interacting with the environment. The comments were delivered during an onstage panel discussion, and video clips of the session spread on X.
Positive users endorse Fei-Fei Li's warning that AI overfocuses on language models and call for embodied interaction plus other modalities, while negative users dismiss her as a talking head or sarcastically defend language's role.
No Digg Deeper questions have been answered for this story yet.
Most Activity
💯. Way too much focus on language models.
Fei-Fei Li warns that AI may be staring too hard at language models. The world is not just text on a screen. It is physical, visual, spatial, and always changing. Most of the economy runs on seeing, moving, interacting, and embodied intelligence.
Fei-Fei Li warns that AI may be staring too hard at language models. The world is not just text on a screen. It is physical, visual, spatial, and always changing. Most of the economy runs on seeing, moving, interacting, and embodied intelligence.

@rohanpaul_ai Seriously? Robotic components speak a "language" too. LLMs can speak every human language, C, Java, Python, and... and... and... You think it cant get input from cameras and take action by "talking" to robotic components?

@GaryMarcus Google is walking past the big AI companies because they are not just an AI company. Because they are quiet and don't try to get silly headlines. They're the only company close to what China is doing right now. Data center investors beware. https://cloud.google.com/blog/products/compute/tpu-8t-and-tpu-8i-technical-deep-dive

lol, in college I worked in the foreign language department as a secretary… and the Russian professor I didn’t have a crush on, but I would full body blush around him. Like I wasn’t attracted to him, but legit full body bright red, fumble my words. His wife also worked in the department. She was always sweet to me and I think she could tell it wasn’t like that… I didn’t ogle or follow.. so I’m going to guess Fei Fei …?

@rohanpaul_ai @grok what‘s the original source for the video?
@GaryMarcus so true.
💯. Way too much focus on language models.

@rohanpaul_ai Love her so much!

@axiomwave_xbt @GaryMarcus LLMs are easy to show off and seem impressive because text modeling (not language!) is relatively easy and humans have powerful innate text modelers. Too bad for AI companies, they are not very useful in the grand scheme of things.

Artificial kinetic intelligence (AKI) moves beyond even physical AI, which is limited to the boundary of on-board, vision and sensors. So Fei-Fei is correct that on screen AI is narrowing the future perception of true AGI. The initial AKI framework Already establishes for deeper capability. Artificial Kinetic Intelligence (AKI) — https://doi.org/10.5281/zenodo.19496506

@rtheoryxyz 💯

@rohanpaul_ai Language is the mean we use to convey information.
But this is a false premise, all the serious AIs are already multimodal and understand at least images.

@rohanpaul_ai Often the results are like the AI STARED into the sun too long… And don’t do that!!

@rohanpaul_ai Check out LeWorldModel which runs on a single GPU. Even Elon Musk inquired about it..

@rohanpaul_ai but the fact is, the fundamental architecture is the same: transformer; self attention; multidimensional vector space
datasets would be images, audio and videos

The missing layer is not just multimodal AI.
It is physical observability.
Humans, machines, and environments do not only exchange information through interfaces. They are already physically coupled through motion, vibration, pressure, heat, electromagnetic activity, sensor contact, latency, and feedback.
That shared layer is not language in the symbolic sense.
It is coupled dynamics.
In that layer, information appears as timing relationships: phase, frequency, amplitude, synchronization, resonance, drift, phase-locking, coherence, and recovery after perturbation.
AI usually operates after this physical layer has already been converted into data: sensed, conditioned, digitized, encoded, and represented.
But the human, the machine, and the environment are already interacting before representation.
The deeper question is not only how to build better models of the world.
It is how to measure the stability of the coupled physical system itself.

@rohanpaul_ai overfitting on text feels too common
embodied ai is still the bottleneck

@rohanpaul_ai always been the gap in llm demos
bodied tasks are still solved by humans

@GaryMarcus LLMs are easiest to market and exploit, particularly to those who are ignorant of how the technology works (i.e. CEOs, executives, decision makers, investors, stockholders, etc.). #generativeAI #aigenerated #artificialintelligence #LLMs

@rohanpaul_ai embodied ai could unlock the next wave of practical applications beyond chatbots