/AI19h ago

Study Finds Overlap Between Language Model Features And Brain Responses

5128277811.3K

#1490

Original post

Samuel Hammond 🦉#1490

Michael Lepori@Michael_Lepori

We find that the features most useful for predicting LM representations *in general* are the features that best predict brain responses to language. This suggests a non-trivial overlap in the features used to represent language in LMs and the brain!

12:30 PM · Jun 9, 2026 · 699 Views

/AI19h ago

Study Finds Overlap Between Language Model Features And Brain Responses

5128277811.3K

#1490

Original post

Samuel Hammond 🦉#1490

Michael Lepori@Michael_Lepori

12:30 PM · Jun 9, 2026 · 699 Views

Sentiment

Users praise the new preprint on using SAEs to identify features driving LM-brain alignment as a fascinating direction and important step toward deeper mechanistic understanding.

Pos

100.0%

Neg

0.0%

1 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

VIEWS626LIKES6

Michael Lepori@Michael_Lepori

We validate our model framework by asking if we can recover existing interpretations of brain responses to language. We use our models to predict voxels that have shown tuning to either processing difficulty or meaning abstractness.

19h62661

BOOKMARKS2

Michael Lepori@Michael_Lepori

Check out the paper for WAY more details, analyses, and interpretations: https://arxiv.org/abs/2606.06857

19h25932

RETWEETS24

Michael Lepori@Michael_Lepori

🚨New preprint!🚨 We know that LM representations can be used to predict brain responses to language. But what *features* of these representations underlie this alignment? We use SAEs to find out!

19h10.7K12376

REPLIES1

Ward Plunet@StartupYou

@Michael_Lepori @threadreaderapp please #unroll

17h256

Michael Lepori@Michael_Lepori

I had a ton of fun collaborating with @GretaTuckute and Kendrick Kay on this project! We hope that these close collaborations between mechanistic interpretability researchers and neuroscientists can continue to push both fields forward.

19h35751

Michael Lepori@Michael_Lepori

However, different regions differ in how strongly they rely on these features. For example, frontal regions tend to be more strongly predicted by surprisal than SAE/LM-based “content” features.

19h44931

Michael Lepori@Michael_Lepori

We find that SAE features used to predict a voxel in one region from one participant tend to generalize to other regions and/or other participants, indicating a largely shared feature basis!

19h28921

Michael Lepori@Michael_Lepori

We find that surprisal alone predicts processing difficulty voxels. In contrast, SAE features are required to predict meaning abstractness voxels. Further, we interpret the features, and find that they relate to aspects of concreteness (i.e., descriptions of scenery).

19h28241

Michael Lepori@Michael_Lepori

Three key results -Our models can interpret uncharacterized voxel populations -We find a shared feature basis in the fronto-temporal lang. network, with individual variation -Widely-used features for reconstructing LM representations are also useful for predicting brain responses

19h4932

Michael Lepori@Michael_Lepori

Extended thread: We introduce Augmented Sparse Encoding Models, a framework which uses SAEs to project LM hidden states into an interpretable basis AND includes sentence surprisal as a feature. We then learn a sparse linear mapping from this feature basis to brain responses.

19h4242

Michael Lepori@Michael_Lepori

Next, we identify and interpret a previously uncharacterized, but reliable, voxel population whose responses are predicted by "people-specific" features (i.e., descriptions of people doing things, relationships, pronouns).

19h2252

Michael Lepori@Michael_Lepori

These voxels 1⃣are not uniformly present across participants, demonstrating how we can investigate individual variability in linguistic meaning representations, and 2⃣largely reside outside the core fronto-temporal lang. network, but near areas associated with social cognition

19h2052

Michael Lepori@Michael_Lepori

We apply our model framework to a high-field 7T fMRI dataset of eight participants listening to 200 diverse sentences.

19h3001

Michael Lepori@Michael_Lepori

Thus, our framework recovers interpretations from prior work, indicating that it can provide accurate interpretations of brain responses.

19h2171

Michael Lepori@Michael_Lepori

Last but not least, we investigate the *properties* of the most predictive SAE features. Because we use Matryoshka SAEs, our feature basis is organized into bins that represent increasing granularity...

19h1991

Michael Lepori@Michael_Lepori

Finally, we investigate a larger set of voxels across the fronto-temporal language network to ask whether different brain regions and different participants rely on a shared feature basis during language comprehension.

19h1991

Michael Lepori@Michael_Lepori

...the first bin is small, and is broadly useful for reconstructing many LM representations (“most widely applicable features”), the next bin is contains more granular features (and so on).

19h191

Thread Reader App@threadreaderapp

@StartupYou @Michael_Lepori @StartupYou Hello, please find the unroll here: https://threadreaderapp.com/thread/2064429866100072557.html Have a good day. 🤖

17h7

EB1A Experts@eb1aexperts

@Michael_Lepori Fascinating direction. Moving from demonstrating alignment to understanding which features actually drive it feels like an important step toward more interpretable language and neuroscience models.

8h3