/Tech15h ago

Model Capacity Enables Retention of Rare Task Updates Amid Gradient Interference

3700243

Original post unavailable.

Sentiment

Users praise the paper on model capacity enabling retention of rare task updates amid gradient interference because it lays solid foundations despite many open questions remaining.

Pos

100.0%

Neg

0.0%

1 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

David Alvarez Melis@elmelis

Plenty of open questions left about how to choose mixtures for a given scale, transfer to real pretraining, post-training as a separate axes, etc, but this paper lays solid foundations to think about all of these. Link: https://arxiv.org/abs/2605.29548

15h7

REPLIES1

David Alvarez Melis@elmelis

It also suggests that the classic (learning theory) way to think about model capacity in isolation misses an important part of the story. We ought to think about capacity 𝘳𝘦𝘭𝘢𝘵𝘪𝘷𝘦 𝘵𝘰 task diversity in the training data.

15h6

David Alvarez Melis@elmelis

This (+ lots of recent work on fine-grained scaling laws and data mixtures, by us and others) confirms that scale and data composition aren't independent levers, what you can learn is a **joint** function of both.

15h3