1d ago

Exemplar partitioning rivals sparse autoencoders on AxBench benchmark

71832111417.8K

——0——

Exemplar partitioning applies Voronoi partitions directly to model activations to surface human-understandable structure. The technique performs comparably to or better than sparse autoencoders while using orders of magnitude less compute. It was evaluated on the AxBench benchmark and introduced through an introductory post published on LessWrong.

Original post

#353@DHADFIELDMENELL @JESSICARUMBELOW

Jessica Rumbelow@JESSICARUMBELOW

Voronoi partitions on activations reveal interpretable structure with orders of magnitude less compute than SAEs! Here is an introduction to a new interpretability method: https://www.lesswrong.com/posts/RroeHBSkBXXDsrryq/an-introduction-to-exemplar-partitioning-for-mechanistic-1

9:00 PM · May 15, 2026

Cluster engagement

26 snapshots

Reposted by

#353@DHADFIELDMENELL

QUOTE POST

#678Aryaman Arora@ARYAMAN2020

a very neat new method with great evals on AxBench!!

Jessica Rumbelow@JessicaRumbelow

4:00 AM · May 16, 2026 · 15.8K Views

9:51 PM · May 16, 2026 · 3.9K Views

#678Aryaman Arora@ARYAMAN2020

my gut feeling about feature geometry: there is great progress lately on untying this Gordian knot. but I really really hope methods with very limited presuppositions about geometry can cut through it directly

Aryaman Arora@aryaman2020

a very neat new method with great evals on AxBench!!

9:51 PM · May 16, 2026 · 3.9K Views

9:53 PM · May 16, 2026 · 280 Views