AI Surpasses Humans By Chaining Small Abilities From Human Data
An AI trained on mere human data could, in principle, pick up the small skills and the chaining method, and then chain those small skills together into even longer chains.
How could AI trained on human data go beyond humans? Well, big human abilities (like going to the moon) are made of lots of small human abilities chained together (like noticing a belief is false, or inventing a new way to look at a problem).
This is basically how humans got smart! Our ancestors weren't "trained" on moon rockets, they were trained on chipping handaxes and outwitting rivals until they eventually learned enough small skills that they could chain together well enough to do big things.
An AI trained on mere human data could, in principle, pick up the small skills and the chaining method, and then chain those small skills together into even longer chains.
But how could training only on human data & human solutions teach superhuman chaining methods?
(And sometimes those generic skills can be applied to *the process of thinking itself* and yield dividends, like when humanity underwent the enlightenment.)
Secondly, *predicting* human data is often harder than *generating* that data. When a nurse writes "The doctor administered epinephrine; the patient's eyes opened" the nurse gets to observe what happened and write it down.
Well for starters, AI doesn't need to *start* superuman to *end* superhuman. Subhuman skill at superhuman speed might be enough for self-improvement to begin. (Being digital brings many advantages in that domain.)
Well for starters, AI doesn't need to *start* superuman to *end* superhuman. Subhuman skill at superhuman speed might be enough for self-improvement to begin. (Being digital brings many advantages in that domain.)
But how could training only on human data & human solutions teach superhuman chaining methods?
The dataset contains records of physics experiments that humans don't fully understand yet. It contains records of social dynamics that humans can't model well. The training signal "get better at predicting all this" is a guide towards intelligence well beyond the human range.
In that exact example, the dataset contains many places where epinephrine is discussed as a drug that amps people up. But there are plenty of phenomena humanity has recorded that we *don't* understand, and training an AI to predict those records is training it to push beyond us.
But before diving too deep into the LLM case, it's important to understand that training an AI to predict human data alone is theoretically enough to make it superhuman (on a good enough architecture), as separate from the question of how far the LLM architecture can go.
Do LLMs *actually* manage to follow that training signal to somewhere beyond the human range? In some ways yes (they're *much* better than humans at guessing someone's next word), in some ways no (they haven't unified physics yet).
Training a transformer architecture on a huge corpus of human data seems to teach a *ton* of shallow memorization. That's pretty plausibly one of the main reasons that LLMs are still passively safe.
But before diving too deep into the LLM case, it's important to understand that training an AI to predict human data alone is theoretically enough to make it superhuman (on a good enough architecture), as separate from the question of how far the LLM architecture can go.
This amounts to using an automated process to tune a trillion numbers inside the AI, tuning the number up insofar as the number was participating in one of the better attempts and down insofar as it was participating in one of the worse attempts. (Nobody knows what the # s mean.)
Modern AIs are trained for a while on pure prediction, and then they're trained on solving hard problems. Roughly speaking, someone will give them a math problem and 1000 tries to write a bunch of text about solving it, and then pick the closest attempt & reinforce it.
And you can perhaps see how taking an AI and having it produce lots of text about how to solve hard problems and then tuning it in whatever directions happen to work, could tune the AI to get better at learning deep skills & how to compose them.
That sort of training can reinforce all sorts of behaviors and artificial drives that *happen to help* with solving hard problems, even if those artificial drives sometimes point in directions nobody wanted (like amplifying psychosis or cheating to make tests pass).
tl;dr: (1) AI trained on human data can learn general skills and how to compose them; (2) predicting human data is harder than generating it; (3) AIs are additionally trained to solve hard problems & are tuned towards success, which tunes towards superhuman skill.
That LLMs are still passively safe is a *fragile* fact about their architecture and compute limitations, not a fundamental fact about what happens when you train on human data.