4h ago

Collocational Bootstrapping Hypothesis Predicts Tradeoff In Subject-Verb Learning

0
Original post

🤖🧠NEW PAPER🧠🤖 Children & neural networks can learn syntax from linear strings of words. How do they do it? Our hypothesis: Word co-occurrence statistics provide cues to syntax! (I.e., a new type of bootstrapping to consider!) Paper: https://arxiv.org/abs/2605.20529 1/n

Paper overview.
Title: "Collocational bootstrapping: A hypothesis about the learning of subject-verb agreement in humans and neural networks"
Authors: Claire Hobbs and Tom McCoy
Method: We trained many neural nets, varying how predictable a subject is given its verb. We tested them on subject-verb agreement
Findings: With the right level of predictability, neural networks robustly generalize. The predictability of child-directed language is near the neural net optimum.
Conclusion: Statistical regularities in word co-occurrence can support the learning of abstract syntactic rules
The text is accompanied by a graph showing neural-network accuracy as a function of the level of variability; the accuracy peaks at an in-between level of variability
7:30 AM · May 22, 2026 View on X

We use subject-verb agreement as a case study. Consider "the dog in the park barks." How can a learner tell if the subject is "dog" or "park"?

Well, "dog" occurs with "barks" much more than "park" does, giving a clue that the subject is "dog"!

2/n

Tom McCoyTom McCoy@RTomMcCoy

🤖🧠NEW PAPER🧠🤖 Children & neural networks can learn syntax from linear strings of words. How do they do it? Our hypothesis: Word co-occurrence statistics provide cues to syntax! (I.e., a new type of bootstrapping to consider!) Paper: https://arxiv.org/abs/2605.20529 1/n

2:30 PM · May 22, 2026 · 1.3K Views
2:31 PM · May 22, 2026 · 123 Views

Perhaps this type of inference based on statistical cooccurrence provides a stepping stone toward figuring out the true structural relationship that defines where the subject is.

We term this idea the *collocational bootstrapping hypothesis*

3/n

Tom McCoyTom McCoy@RTomMcCoy

We use subject-verb agreement as a case study. Consider "the dog in the park barks." How can a learner tell if the subject is "dog" or "park"? Well, "dog" occurs with "barks" much more than "park" does, giving a clue that the subject is "dog"! 2/n

2:31 PM · May 22, 2026 · 123 Views
2:31 PM · May 22, 2026 · 91 Views

This framing predicts a tradeoff: If subject-verb pairings are too variable, there might not be enough signal to identify structure from. But if the pairings are too predictable, learners might not generalize to new subject-verb pairs.

4/n

Tom McCoyTom McCoy@RTomMcCoy

Perhaps this type of inference based on statistical cooccurrence provides a stepping stone toward figuring out the true structural relationship that defines where the subject is. We term this idea the *collocational bootstrapping hypothesis* 3/n

2:31 PM · May 22, 2026 · 91 Views
2:31 PM · May 22, 2026 · 45 Views

The question: is there a sweet spot that manages this tension - a level of variability that supports robust generalization?

5/n

Tom McCoyTom McCoy@RTomMcCoy

This framing predicts a tradeoff: If subject-verb pairings are too variable, there might not be enough signal to identify structure from. But if the pairings are too predictable, learners might not generalize to new subject-verb pairs. 4/n

2:31 PM · May 22, 2026 · 45 Views
2:31 PM · May 22, 2026 · 47 Views

To test this, we train neural network language models on synthetic data across many conditions varying how predictable subject-verb pairings are

We do this by sampling subj-verb pairs from Zipfian distributions that vary a parameter defining how skewed the distribution is

6/n

Zipfian distributions with various values of the alpha parameter that modulates how skewed the distribution is
Tom McCoyTom McCoy@RTomMcCoy

The question: is there a sweet spot that manages this tension - a level of variability that supports robust generalization? 5/n

2:31 PM · May 22, 2026 · 47 Views
2:32 PM · May 22, 2026 · 49 Views

We find that there is indeed a sweet spot of variability where the neural networks robustly generalize subject-verb agreement!

7/n

Plot showing neural net subject-verb agreement accuracy as a function of the variability in the training data. Accuracy is optimized (and is near 1.0) when the level of variability is medium.
Tom McCoyTom McCoy@RTomMcCoy

To test this, we train neural network language models on synthetic data across many conditions varying how predictable subject-verb pairings are We do this by sampling subj-verb pairs from Zipfian distributions that vary a parameter defining how skewed the distribution is 6/n

2:32 PM · May 22, 2026 · 49 Views
2:33 PM · May 22, 2026 · 42 Views

Since these conditions only varied in the level of predictability of subj-verb pairings, this is evidence that properties of word-cooccurrence statistics can have a substantial effect on how well statistical learning generalizes - sometimes supporting robust generalization

8/n

Tom McCoyTom McCoy@RTomMcCoy

We find that there is indeed a sweet spot of variability where the neural networks robustly generalize subject-verb agreement! 7/n

2:33 PM · May 22, 2026 · 42 Views
2:33 PM · May 22, 2026 · 33 Views

We next analyzed child-directed language & found that the variability in it was close to the level that optimized neural net generalization.

This is evidence that child-directed language has the statistical properties that make collocational bootstrapping effective!

9/n

A Zipfian distribution fitted to child-directed language; the empirical distribution matches the theoretical one closely
Tom McCoyTom McCoy@RTomMcCoy

Since these conditions only varied in the level of predictability of subj-verb pairings, this is evidence that properties of word-cooccurrence statistics can have a substantial effect on how well statistical learning generalizes - sometimes supporting robust generalization 8/n

2:33 PM · May 22, 2026 · 33 Views
2:34 PM · May 22, 2026 · 33 Views

I’m excited about this paper because statistical regularities sometimes pose a problem for learning abstract syntax (eg, https://aclanthology.org/P19-1334/), but these results are an example of statistics *supporting* the right abstractions!

Paper link again: https://arxiv.org/abs/2605.20529

10/n

Tom McCoyTom McCoy@RTomMcCoy

We next analyzed child-directed language & found that the variability in it was close to the level that optimized neural net generalization. This is evidence that child-directed language has the statistical properties that make collocational bootstrapping effective! 9/n

2:34 PM · May 22, 2026 · 33 Views
2:36 PM · May 22, 2026 · 34 Views

This project was led by Claire Hobbs (not on Twitter), a talented undergrad who just graduated from Yale. This paper is a condensed version of her senior thesis, which received a Robert J. Glushko Prize for Distinguished Undergraduate Research in CogSci! (@rjglushko)

11/n

Tom McCoyTom McCoy@RTomMcCoy

I’m excited about this paper because statistical regularities sometimes pose a problem for learning abstract syntax (eg, https://aclanthology.org/P19-1334/), but these results are an example of statistics *supporting* the right abstractions! Paper link again: https://arxiv.org/abs/2605.20529 10/n

2:36 PM · May 22, 2026 · 34 Views
2:36 PM · May 22, 2026 · 70 Views