2d ago

Tal Linzen, associate professor at NYU and research scientist at Google, discusses on the Information Bottleneck podcast why children acquire language from roughly 100 million words while large language models require trillions of tokens

Stronger next-word prediction in models reduces human processing alignment.

0
Original post

New episode of The Information Bottleneck is out with @tallinzen, Associate Professor at NYU and Research Scientist at Google. Tal works at the intersection of cognitive science and language models, and he's one of the clearest voices on what humans and LLMs can actually teach us about each other. We talked about why children learn language from 100M words while LLMs need trillions, the surprising finding that as models get better at predicting the next word they become worse models of humans, inductive biases and synthetic languages, world models and whether transformers actually use them, BabyLM, and how AI coding tools are changing the way he teaches at NYU. I'm sure you will enjoy it!

11:32 AM · May 17, 2026 View on X

This was a super fun conversation, thanks for having me on the podcast!

Ravid Shwartz ZivRavid Shwartz Ziv@ziv_ravid

New episode of The Information Bottleneck is out with @tallinzen, Associate Professor at NYU and Research Scientist at Google. Tal works at the intersection of cognitive science and language models, and he's one of the clearest voices on what humans and LLMs can actually teach us about each other. We talked about why children learn language from 100M words while LLMs need trillions, the surprising finding that as models get better at predicting the next word they become worse models of humans, inductive biases and synthetic languages, world models and whether transformers actually use them, BabyLM, and how AI coding tools are changing the way he teaches at NYU. I'm sure you will enjoy it!

6:32 PM · May 17, 2026 · 6.6K Views
6:44 PM · May 17, 2026 · 3.5K Views

The episode - https://www.the-information-bottleneck.com/language-cognition-and-the-limits-of-llms/

Ravid Shwartz ZivRavid Shwartz Ziv@ziv_ravid

New episode of The Information Bottleneck is out with @tallinzen, Associate Professor at NYU and Research Scientist at Google. Tal works at the intersection of cognitive science and language models, and he's one of the clearest voices on what humans and LLMs can actually teach us about each other. We talked about why children learn language from 100M words while LLMs need trillions, the surprising finding that as models get better at predicting the next word they become worse models of humans, inductive biases and synthetic languages, world models and whether transformers actually use them, BabyLM, and how AI coding tools are changing the way he teaches at NYU. I'm sure you will enjoy it!

6:32 PM · May 17, 2026 · 6.6K Views
6:32 PM · May 17, 2026 · 475 Views