/Tech1h ago

Guidelabs AI Finds Larger Models Grow More Interpretable With Built-In Training

733241.8K
Original post
Julius Adebayo@juliusadml#1672inTech

馃Уwe've been teasing this @guidelabsai for a while: interpretability scales. Not survives scaling, but models actually get easier to understand, as you scale, if trained correctly. Let me share some of my favorite results in this thread.

Guide Labs@guidelabsai

Today we鈥檙e announcing a finding that breaks a core assumption in AI: that bigger models are harder to understand.

We show the opposite. When interpretability is built into training, models become MORE understandable as they become more capable.

2:47 PM 路 Jun 11, 2026 路 1.5K Views
Sentiment
Sentiment building, check back later.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS90REPLIES1
Julius Adebayo@juliusadml

every interpretability metric we track improves with scale: representations get more disentangled and aligned, with humans, as you scale from 10 million params to 8 billion.

Julius Adebayo@juliusadml

馃Уwe've been teasing this @guidelabsai for a while: interpretability scales. Not survives scaling, but models actually get easier to understand, as you scale, if trained correctly. Let me share some of my favorite results in this thread.

1hViews 90Likes 3Bookmarks 0
BOOKMARKS1
Julius Adebayo@juliusadml

you can trace your model's output to training data, input, and concepts in its representations.

Julius Adebayo@juliusadml

extrapolation of validation loss to within .11 nats :) interpretable models really do scale predictably.

1hViews 22Likes 4Bookmarks 1
LIKES4
Julius Adebayo@juliusadml

extrapolation of validation loss to within .11 nats :) interpretable models really do scale predictably.

Julius Adebayo@juliusadml

every interpretability metric we track improves with scale: representations get more disentangled and aligned, with humans, as you scale from 10 million params to 8 billion.

1hViews 49Likes 4Bookmarks 0
Julius Adebayo@juliusadml

Read this twitter article for a summary of the highlights:

Guide Labs@guidelabsai

http://x.com/i/article/2065113469607895040

1hViews 87Likes 3Bookmarks 0
Julius Adebayo@juliusadml

And if you as obsessed with this as we are, here is a preview of our tech report: https://www.guidelabs.ai/papers/scaling-inherently-interpretable-language-models.pdf

Julius Adebayo@juliusadml

Read this twitter article for a summary of the highlights:

1hViews 42Likes 3Bookmarks 0