Collocational Bootstrapping Hypothesis Predicts Tradeoff In Subject-Verb Learning

REPLY

We use subject-verb agreement as a case study. Consider "the dog in the park barks." How can a learner tell if the subject is "dog" or "park"?

Well, "dog" occurs with "barks" much more than "park" does, giving a clue that the subject is "dog"!

2/n

Tom McCoy@RTomMcCoy

🤖🧠NEW PAPER🧠🤖 Children & neural networks can learn syntax from linear strings of words. How do they do it? Our hypothesis: Word co-occurrence statistics provide cues to syntax! (I.e., a new type of bootstrapping to consider!) Paper: https://arxiv.org/abs/2605.20529 1/n

2:30 PM · May 22, 2026 · 1.3K Views

2:31 PM · May 22, 2026 · 123 Views

REPLY

#834Tom McCoy@RTOMMCCOY

Perhaps this type of inference based on statistical cooccurrence provides a stepping stone toward figuring out the true structural relationship that defines where the subject is.

We term this idea the *collocational bootstrapping hypothesis*

3/n

Tom McCoy@RTomMcCoy

We use subject-verb agreement as a case study. Consider "the dog in the park barks." How can a learner tell if the subject is "dog" or "park"? Well, "dog" occurs with "barks" much more than "park" does, giving a clue that the subject is "dog"! 2/n

2:31 PM · May 22, 2026 · 123 Views

2:31 PM · May 22, 2026 · 91 Views

REPLY

#834Tom McCoy@RTOMMCCOY

This framing predicts a tradeoff: If subject-verb pairings are too variable, there might not be enough signal to identify structure from. But if the pairings are too predictable, learners might not generalize to new subject-verb pairs.

4/n

Tom McCoy@RTomMcCoy

Perhaps this type of inference based on statistical cooccurrence provides a stepping stone toward figuring out the true structural relationship that defines where the subject is. We term this idea the *collocational bootstrapping hypothesis* 3/n

2:31 PM · May 22, 2026 · 91 Views

2:31 PM · May 22, 2026 · 45 Views

REPLY

#834Tom McCoy@RTOMMCCOY

The question: is there a sweet spot that manages this tension - a level of variability that supports robust generalization?

5/n

Tom McCoy@RTomMcCoy

This framing predicts a tradeoff: If subject-verb pairings are too variable, there might not be enough signal to identify structure from. But if the pairings are too predictable, learners might not generalize to new subject-verb pairs. 4/n

2:31 PM · May 22, 2026 · 45 Views

2:31 PM · May 22, 2026 · 47 Views

REPLY

#834Tom McCoy@RTOMMCCOY

To test this, we train neural network language models on synthetic data across many conditions varying how predictable subject-verb pairings are

We do this by sampling subj-verb pairs from Zipfian distributions that vary a parameter defining how skewed the distribution is

6/n

Zipfian distributions with various values of the alpha parameter that modulates how skewed the distribution is

Tom McCoy@RTomMcCoy

The question: is there a sweet spot that manages this tension - a level of variability that supports robust generalization? 5/n

2:31 PM · May 22, 2026 · 47 Views

2:32 PM · May 22, 2026 · 49 Views

REPLY

#834Tom McCoy@RTOMMCCOY

We find that there is indeed a sweet spot of variability where the neural networks robustly generalize subject-verb agreement!

7/n

Plot showing neural net subject-verb agreement accuracy as a function of the variability in the training data. Accuracy is optimized (and is near 1.0) when the level of variability is medium.

Tom McCoy@RTomMcCoy

To test this, we train neural network language models on synthetic data across many conditions varying how predictable subject-verb pairings are We do this by sampling subj-verb pairs from Zipfian distributions that vary a parameter defining how skewed the distribution is 6/n

2:32 PM · May 22, 2026 · 49 Views

2:33 PM · May 22, 2026 · 42 Views

REPLY

#834Tom McCoy@RTOMMCCOY

Since these conditions only varied in the level of predictability of subj-verb pairings, this is evidence that properties of word-cooccurrence statistics can have a substantial effect on how well statistical learning generalizes - sometimes supporting robust generalization

8/n

Tom McCoy@RTomMcCoy

We find that there is indeed a sweet spot of variability where the neural networks robustly generalize subject-verb agreement! 7/n

2:33 PM · May 22, 2026 · 42 Views

2:33 PM · May 22, 2026 · 33 Views

REPLY

#834Tom McCoy@RTOMMCCOY

We next analyzed child-directed language & found that the variability in it was close to the level that optimized neural net generalization.

This is evidence that child-directed language has the statistical properties that make collocational bootstrapping effective!

9/n

A Zipfian distribution fitted to child-directed language; the empirical distribution matches the theoretical one closely

Tom McCoy@RTomMcCoy

Since these conditions only varied in the level of predictability of subj-verb pairings, this is evidence that properties of word-cooccurrence statistics can have a substantial effect on how well statistical learning generalizes - sometimes supporting robust generalization 8/n

2:33 PM · May 22, 2026 · 33 Views

2:34 PM · May 22, 2026 · 33 Views

REPLY

#834Tom McCoy@RTOMMCCOY

I’m excited about this paper because statistical regularities sometimes pose a problem for learning abstract syntax (eg, https://aclanthology.org/P19-1334/), but these results are an example of statistics *supporting* the right abstractions!

Paper link again: https://arxiv.org/abs/2605.20529

10/n

Tom McCoy@RTomMcCoy

We next analyzed child-directed language & found that the variability in it was close to the level that optimized neural net generalization. This is evidence that child-directed language has the statistical properties that make collocational bootstrapping effective! 9/n

2:34 PM · May 22, 2026 · 33 Views

2:36 PM · May 22, 2026 · 34 Views

REPLY

#834Tom McCoy@RTOMMCCOY

This project was led by Claire Hobbs (not on Twitter), a talented undergrad who just graduated from Yale. This paper is a condensed version of her senior thesis, which received a Robert J. Glushko Prize for Distinguished Undergraduate Research in CogSci! (@rjglushko)

11/n

Tom McCoy@RTomMcCoy

I’m excited about this paper because statistical regularities sometimes pose a problem for learning abstract syntax (eg, https://aclanthology.org/P19-1334/), but these results are an example of statistics *supporting* the right abstractions! Paper link again: https://arxiv.org/abs/2605.20529 10/n

2:36 PM · May 22, 2026 · 34 Views

2:36 PM · May 22, 2026 · 70 Views

Collocational Bootstrapping Hypothesis Predicts Tradeoff In Subject-Verb Learning

Sentiment

Cluster engagement