/Tech1h ago

Feature Geometry Predicts LLM Errors on Unseen Concept Combinations

52221934

Original post

There’s one small hiccup before we spin feature similarity "straw" into error-forecasting "gold": internal representations have a multiscale structure dominated by properties like prompt format. These background clusters aren’t relevant, so we have to control them first.

Naomi Saphra@nsaphra

This interference is quick to calculate, so we can sift through all possible concept combinations to find adversarial scenarios to stump the model. Only then do we need to actually generate, translate, or find a specific challenging input instantiating that scenario!

6:07 AM · Jun 15, 2026 · 140 Views

Sentiment

Users are excited about research using vector angles to predict LLM failures, calling it exciting new work led by Jennifer Lumeng with strong collaborator support.

Pos

100.0%

Neg

0.0%

3 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

VIEWS349LIKES5RETWEETS2

Naomi Saphra@nsaphra

To predict errors from compositional interference, we need to control for large-scale representational structures. Geometry is dominated by formatting clusters, so we center examples within each cluster. This necessity illuminates the multiscale structure of data manifolds!

Naomi Saphra@nsaphra

Across languages, we can predict multilingual failures without running LLMs on the non-English inputs, just by looking at the angle between a language subspace and a prompt activation.

Our error predictions outperform majority baselines for EVERY language tested.

1h34951

BOOKMARKS2

Naomi Saphra@nsaphra

Our new paper sets the stage for the biggest practical use case of model interpretability: stress testing and dataset development. All you need is interpretable linear features and simple geometry. https://arxiv.org/abs/2606.13934

1h8822

REPLIES1

Naomi Saphra@nsaphra

We first test these error predictions on a toy compositional task. When we group examples by the interference among their atomic concept representations, each model has lower accuracy on higher-interference subsets, across training settings.

Naomi Saphra@nsaphra

Does an LLM know cat facts when speaking French? We'll use feature geometry to answer, without evaluating specific inputs. Flagged combos provide scalable, targeted stress tests—a win in a data-bottlenecked world. Imagine trying just 5% of scenarios to find every error?

1h13040

Naomi Saphra@nsaphra

Exciting new work led by @jenniferlumeng (who didn’t want to post this thread but you should follow her) with support from @ruochenz_, @wordscompute, Ellie Pavlick, @elmelis and me. (From @Brown_NLP @KempnerInst @BU_CDS)

Naomi Saphra@nsaphra

Beyond individual examples, compositional interference also predicts dataset-level difficulty: in both multilingual fact recall and multihop reasoning, higher interference among coarse-grained concept subspaces (eg, "birth year facts" and "Japanese") predicts lower set accuracy.

1h18650

Naomi Saphra@nsaphra

We can predict LLM errors in multihop reasoning prompts like, "What year was the author of 1984 born?", which combines the queries, "Who wrote 1984?" and, "What year was George Orwell born?" Interference between queries predicts LLM errors on this task, too!

Naomi Saphra@nsaphra

1h12940

Naomi Saphra@nsaphra

1h501

Naomi Saphra@nsaphra

Across languages, we can predict multilingual failures without running LLMs on the non-English inputs, just by looking at the angle between a language subspace and a prompt activation.

Our error predictions outperform majority baselines for EVERY language tested.

1h231

Naomi Saphra@nsaphra

1h231

Naomi Saphra@nsaphra

To recall a fact in a specific language, LLMs translate and retrieve knowledge in a sensitive pipeline. When any query can be in any language, it is expensive to translate and find every error, but we can PREDICT them from interference between the language and the English fact.

1h221