1d ago

Exemplar partitioning rivals sparse autoencoders on AxBench benchmark

0

Exemplar partitioning applies Voronoi partitions directly to model activations to surface human-understandable structure. The technique performs comparably to or better than sparse autoencoders while using orders of magnitude less compute. It was evaluated on the AxBench benchmark and introduced through an introductory post published on LessWrong.

Original post

Voronoi partitions on activations reveal interpretable structure with orders of magnitude less compute than SAEs! Here is an introduction to a new interpretability method: https://www.lesswrong.com/posts/RroeHBSkBXXDsrryq/an-introduction-to-exemplar-partitioning-for-mechanistic-1

9:00 PM · May 15, 2026 View on X
Reposted by

a very neat new method with great evals on AxBench!!

Jessica RumbelowJessica Rumbelow@JessicaRumbelow

Voronoi partitions on activations reveal interpretable structure with orders of magnitude less compute than SAEs! Here is an introduction to a new interpretability method: https://www.lesswrong.com/posts/RroeHBSkBXXDsrryq/an-introduction-to-exemplar-partitioning-for-mechanistic-1

4:00 AM · May 16, 2026 · 15.8K Views
9:51 PM · May 16, 2026 · 3.9K Views

my gut feeling about feature geometry: there is great progress lately on untying this Gordian knot. but I really really hope methods with very limited presuppositions about geometry can cut through it directly

Aryaman AroraAryaman Arora@aryaman2020

a very neat new method with great evals on AxBench!!

9:51 PM · May 16, 2026 · 3.9K Views
9:53 PM · May 16, 2026 · 280 Views
Exemplar partitioning rivals sparse autoencoders on AxBench benchmark · Digg