4h ago

GoodfireAI shares research showing sparse autoencoders tile and shatter curved neural manifolds in large models rather than linear directions, recasting unsupervised discovery as an inverse Ising problem

Visualizations map features across a manifold spanning 1800 to 1998.

0
Original post

How do SAEs capture concept manifolds? 🍩 I think this is important work. we study how SAEs handle the geometric structures we've identified and find they tile/shatter them in a particular way we characterize, letting us recast unsupervised manifold discovery as inverse Ising

8:54 AM · May 21, 2026 View on X
Reposted by

@CFGeek is right! This is exactly what we tried to do with MFA:

Charles FosterCharles Foster@CFGeek

The next step here is to go for direct unsupervised recovery of feature geometry from activations, rather than this two-step SAE → clustering stuff.

5:05 PM · May 21, 2026 · 1.2K Views
7:16 PM · May 21, 2026 · 291 Views

@CFGeek My student @OrShafran has made this attempt actually: https://arxiv.org/abs/2602.02464 Works pretty good

Charles FosterCharles Foster@CFGeek

The next step here is to go for direct unsupervised recovery of feature geometry from activations, rather than this two-step SAE → clustering stuff.

5:05 PM · May 21, 2026 · 1.2K Views
7:04 PM · May 21, 2026 · 103 Views

The next step here is to go for direct unsupervised recovery of feature geometry from activations, rather than this two-step SAE → clustering stuff.

5:05 PM · May 21, 2026 · 1.2K Views

we have an automatic shape-finder, so you can find the shapes your model thinks in and receive twitter clout.

code is open-source, or you can also get silico to find your manifolds for you

GoodfireGoodfire@GoodfireAI

The most popular way to interpret AI is missing the bigger picture. Models think in curved shapes. But sparse autoencoders (SAEs) work with straight lines. Can they still capture models’ curved neural geometry? Yes, but not how you might think! (1/7)

3:45 PM · May 21, 2026 · 39.2K Views
4:21 PM · May 21, 2026 · 838 Views

Super excited to have this paper finally out! So many nuggets here, but a critical highlight: you should *not* interpret SAE features in isolation. The population geometry is where it's all at! Similar to this image of us @GoodfireAI folks playing out the elephant parable. :P

GoodfireGoodfire@GoodfireAI

The most popular way to interpret AI is missing the bigger picture. Models think in curved shapes. But sparse autoencoders (SAEs) work with straight lines. Can they still capture models’ curved neural geometry? Yes, but not how you might think! (1/7)

3:45 PM · May 21, 2026 · 39.2K Views
4:13 PM · May 21, 2026 · 3.2K Views

For the physics bros: if you think of SAE features as mere on-off switches, that oughta remind you of Ising models. You can use this to unsupervisedly discover manifolds from SAE activities! Check out code link below!

4:28 PM · May 21, 2026 · 3.6K Views
GoodfireAI shares research showing sparse autoencoders tile and shatter curved neural manifolds in large models rather than linear directions, recasting unsupervised discovery as an inverse Ising problem · Digg