Collocational Bootstrapping Hypothesis Predicts Tradeoff In Subject-Verb Learning
We use subject-verb agreement as a case study. Consider "the dog in the park barks." How can a learner tell if the subject is "dog" or "park"?
Well, "dog" occurs with "barks" much more than "park" does, giving a clue that the subject is "dog"!
2/n
🤖🧠NEW PAPER🧠🤖 Children & neural networks can learn syntax from linear strings of words. How do they do it? Our hypothesis: Word co-occurrence statistics provide cues to syntax! (I.e., a new type of bootstrapping to consider!) Paper: https://arxiv.org/abs/2605.20529 1/n
Perhaps this type of inference based on statistical cooccurrence provides a stepping stone toward figuring out the true structural relationship that defines where the subject is.
We term this idea the *collocational bootstrapping hypothesis*
3/n
We use subject-verb agreement as a case study. Consider "the dog in the park barks." How can a learner tell if the subject is "dog" or "park"? Well, "dog" occurs with "barks" much more than "park" does, giving a clue that the subject is "dog"! 2/n
This framing predicts a tradeoff: If subject-verb pairings are too variable, there might not be enough signal to identify structure from. But if the pairings are too predictable, learners might not generalize to new subject-verb pairs.
4/n
Perhaps this type of inference based on statistical cooccurrence provides a stepping stone toward figuring out the true structural relationship that defines where the subject is. We term this idea the *collocational bootstrapping hypothesis* 3/n
The question: is there a sweet spot that manages this tension - a level of variability that supports robust generalization?
5/n
This framing predicts a tradeoff: If subject-verb pairings are too variable, there might not be enough signal to identify structure from. But if the pairings are too predictable, learners might not generalize to new subject-verb pairs. 4/n
To test this, we train neural network language models on synthetic data across many conditions varying how predictable subject-verb pairings are
We do this by sampling subj-verb pairs from Zipfian distributions that vary a parameter defining how skewed the distribution is
6/n

The question: is there a sweet spot that manages this tension - a level of variability that supports robust generalization? 5/n
We find that there is indeed a sweet spot of variability where the neural networks robustly generalize subject-verb agreement!
7/n

To test this, we train neural network language models on synthetic data across many conditions varying how predictable subject-verb pairings are We do this by sampling subj-verb pairs from Zipfian distributions that vary a parameter defining how skewed the distribution is 6/n
Since these conditions only varied in the level of predictability of subj-verb pairings, this is evidence that properties of word-cooccurrence statistics can have a substantial effect on how well statistical learning generalizes - sometimes supporting robust generalization
8/n
We find that there is indeed a sweet spot of variability where the neural networks robustly generalize subject-verb agreement! 7/n
We next analyzed child-directed language & found that the variability in it was close to the level that optimized neural net generalization.
This is evidence that child-directed language has the statistical properties that make collocational bootstrapping effective!
9/n

Since these conditions only varied in the level of predictability of subj-verb pairings, this is evidence that properties of word-cooccurrence statistics can have a substantial effect on how well statistical learning generalizes - sometimes supporting robust generalization 8/n
I’m excited about this paper because statistical regularities sometimes pose a problem for learning abstract syntax (eg, https://aclanthology.org/P19-1334/), but these results are an example of statistics *supporting* the right abstractions!
Paper link again: https://arxiv.org/abs/2605.20529
10/n
We next analyzed child-directed language & found that the variability in it was close to the level that optimized neural net generalization. This is evidence that child-directed language has the statistical properties that make collocational bootstrapping effective! 9/n
This project was led by Claire Hobbs (not on Twitter), a talented undergrad who just graduated from Yale. This paper is a condensed version of her senior thesis, which received a Robert J. Glushko Prize for Distinguished Undergraduate Research in CogSci! (@rjglushko)
11/n
I’m excited about this paper because statistical regularities sometimes pose a problem for learning abstract syntax (eg, https://aclanthology.org/P19-1334/), but these results are an example of statistics *supporting* the right abstractions! Paper link again: https://arxiv.org/abs/2605.20529 10/n