Sergey Levine argues LLMs develop emergent capabilities by composing simpler skills in novel ways instead of imitating training data · Digg

Sergey Levine argues LLMs develop emergent capabilities by composing simpler skills in novel ways instead of imitating training data · Digg

Posts from X

Most Activity

VIEWS11KBOOKMARKS32RETWEETS5

Dwarkesh Patel@dwarkesh_sp

Check out the full interview with one of the top robotics researchers: https://www.dwarkesh.com/p/sergey-levine

Dwarkesh Patel@dwarkesh_sp

We pre-train LLMs on the whole of the internet. You might think this explains how they learn so many emergent capabilities: the knowledge is implicit in the training data.

But in fact models can do things that were never demonstrated anywhere in training!

@svlevine argues that the real source of emergent capabilities is compositionality:

30d11K4732

LIKES57REPLIES8

Cody Blakeney@code_star

I'm willing to believe this is true, but I also think people who make statements like this haven't seen the amount of crazy shit that is actually on the internet. It might be very hard for you to track down, but there is almost always an example for anything.

Dwarkesh Patel@dwarkesh_sp

We pre-train LLMs on the whole of the internet. You might think this explains how they learn so many emergent capabilities: the knowledge is implicit in the training data.

But in fact models can do things that were never demonstrated anywhere in training!

@svlevine argues that the real source of emergent capabilities is compositionality:

30d7.5K5711

Anirudh Goyal@anirudhg9119

@dwarkesh_sp

This is the phenomenon our paper (with @prfsanjeevarora) tried to formalize: as models scale, basic skills can compose into complex skills.

That gives a theory for emergence beyond direct imitation of training data.

https://arxiv.org/abs/2307.15936

Dwarkesh Patel@dwarkesh_sp

We pre-train LLMs on the whole of the internet. You might think this explains how they learn so many emergent capabilities: the knowledge is implicit in the training data.

But in fact models can do things that were never demonstrated anywhere in training!

@svlevine argues that the real source of emergent capabilities is compositionality:

30d2.4K2628

Aryaman Arora@aryaman2020

one problem with this specific example is that the Journal of the IPA up until some year actually did require all submitted articles to be written in the IPA! this is a source of paper-length texts in IPA in pretraining

Dwarkesh Patel@dwarkesh_sp

We pre-train LLMs on the whole of the internet. You might think this explains how they learn so many emergent capabilities: the knowledge is implicit in the training data.

But in fact models can do things that were never demonstrated anywhere in training!

@svlevine argues that the real source of emergent capabilities is compositionality:

29d7.3K3612

Anirudh Goyal@anirudhg9119

@code_star This is the phenomenon our paper tried to formalize: as models scale, reducing next token prediction leads to formation of skills and skills can compose into complex skills.

That gives a theory for emergence beyond direct imitation of training data.

http://arxiv.org/abs/2307.15936

Cody Blakeney@code_star

I'm willing to believe this is true, but I also think people who make statements like this haven't seen the amount of crazy shit that is actually on the internet. It might be very hard for you to track down, but there is almost always an example for anything.

30d5791111

Elan Barenholtz@ebarenholtz

The remarkable thing we’ve learned is that langauge itself contains this combinatoric capacity to generate infinite “accurate” continuations of any sequence, based on its topological structure. This structure is LLMs learn and it is what thinking linguistically IS, in them and us too.

29d465105

Yoav Artzi@yoavartzi

Never underestimate the Internet

(a) almost anything you can think of is there; (b) for every claim of emergence there exists a user that will point to existing data showing exactly the same ;)

Aryaman Arora@aryaman2020

one problem with this specific example is that the Journal of the IPA up until some year actually did require all submitted articles to be written in the IPA! this is a source of paper-length texts in IPA in pretraining

29d2.2K81

Aryaman Arora@aryaman2020

@dwarkesh_sp e.g.

Aryaman Arora@aryaman2020

one problem with this specific example is that the Journal of the IPA up until some year actually did require all submitted articles to be written in the IPA! this is a source of paper-length texts in IPA in pretraining

28d33621

Virgil Maro@_virgil19

@dwarkesh_sp @svlevine compositionality explains the novel outputs but not the novel primitives. it recombines parts it already has, it cant acquire a new one from one example like you do. thats the gap

30d4633

Chris Offner@chrisoffner3d

@code_star Yeah, I don't know how anyone can claim that their model trained on the internet generalizes "out of distribution" when *you cannot know* the training distribution! Nobody knows what is or isn't in a training set this large!

29d1314

Patrick@sterasmas

@dwarkesh_sp @svlevine how is this generalization in a meaningful sense? it composes simple mappings but its 1) generating a recipe then 2) translating normal english into IPA. fundamentally both of these are presented in the training data. help me out here pls…

29d1922

Christopher Potts@ChrisGPotts

@aryaman2020 This also brought me back to my college days where I had an assignment to read long passages in IPA. I remember my delight upon noticing that I was doing a Scottish accent! It's among my top 10 moments that made me a linguist.

Aryaman Arora@aryaman2020

one problem with this specific example is that the Journal of the IPA up until some year actually did require all submitted articles to be written in the IPA! this is a source of paper-length texts in IPA in pretraining

29d42040

Clay@clay_phi

@dwarkesh_sp @svlevine Try to elicit a single original idea, or even unique insights from correlating data, from a consumer grade LLM. There is close to zero creativity in current language models

29d208

Phil Trubey@PTrubey

@dwarkesh_sp @svlevine Dwarkesh, I’m surprised @svlevine in this podcast said robotic useful tasks are a year away. Here’s one *of many* robots using Physical Intelligence to do complex useful tasks *in production* today.

29d7002

Virgil Maro@_virgil19

@marcsh @oc_xaoc @dwarkesh_sp @svlevine poets recombine but they get changed by the line they land on, the next poem starts from a place the last one moved them to. the model emits and resets

29d141

lost wanderer@InfinitywaraS

@dwarkesh_sp @svlevine " your Personality is eventually defined by group of 4-5 people you surround yourself with in life "

- so when you hyperscale Compositionality ..... you get cross domain synthesis ??

30d41

/// //@marcsh

@_virgil19 @oc_xaoc @dwarkesh_sp @svlevine Sorta like human poets . . . .

29d16

M Sh@MSh373916531641

@dwarkesh_sp @svlevine Geoffrey Hinton claimed in a podcast a few months ago that current gen models can learn new things.

29d14

𝕏@oc_xaoc

@_virgil19 @dwarkesh_sp @svlevine Do you thing we have explored every combination though?

30d7

Flowers ☾@flowersslop

@dwarkesh_sp @svlevine > you might think this explains how they learn so many emergent capabilities > But in fact models can do things that were never demonstrated anywhere in training!

uhm yeah thats why its called emergent

30d223