/Tech3h ago

Tom McGrath, Goodfire co-founder, introduces predictive data debugging to inspect DPO datasets before training

Story Overview

Goodfire is extending its Silico platform with a probe-based technique that forecasts how specific DPO training examples will shape downstream behaviors like safety compliance or hallucination rates, achieving roughly 0.9 correlation with actual post-training outcomes before any gradient steps occur.

646897737671.5K
Original post
Tom McGrath@banburismus_#1810inTech

goblins? in my training data? it's more likely than you think!

Goodfire@GoodfireAI

Have you debugged your training data? You might not like what you find.

Introducing predictive data debugging: reveal and shape what your model will learn before training.

In DPO datasets, we found broken guardrails, hallucinations, and fish fart fan fiction (seriously). (1/9)

10:07 AM · Jun 11, 2026 · 1.9K Views
Developer Impact

Probes surface hidden data problems early

The method flags broken guardrails, unexpected content, and other patterns inside preference datasets so teams can filter or edit examples ahead of time rather than retraining after issues appear.

Open Question

Wider rollout details stay under wraps

No public pricing, self-serve access, or third-party pipeline integration timelines have been shared yet, leaving open how quickly other labs could adopt the approach beyond current Silico users.

Sentiment

Many users praise Goodfire AI's predictive data debugging tool because it addresses the critical but often-ignored impact of training data quality on model performance, with some sharing past frustrations from bad data.

Pos
100.0%
Neg
0.0%
9 comments with sentiment.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS12.4KBOOKMARKS38LIKES59REPLIES6
Deedy@deedydas

The quality of your data directly dictates the quality of your AI model.

But the way data affects model performance is hand-wavy voodoo at worst and intuition at best.

This new research now lets you debug your data BEFORE you spend a fortune on an irreversible training run.

Goodfire@GoodfireAI

Have you debugged your training data? You might not like what you find.

Introducing predictive data debugging: reveal and shape what your model will learn before training.

In DPO datasets, we found broken guardrails, hallucinations, and fish fart fan fiction (seriously). (1/9)

3hViews 12.4KLikes 59Bookmarks 38
RETWEETS40
Goodfire@GoodfireAI

Have you debugged your training data? You might not like what you find.

Introducing predictive data debugging: reveal and shape what your model will learn before training.

In DPO datasets, we found broken guardrails, hallucinations, and fish fart fan fiction (seriously). (1/9)

3hViews 53.1KLikes 447Bookmarks 312
Goodfire@GoodfireAI

#4: fart fishing

Buried in Dolci is a cluster of very specific fan fiction, where characters fart in ponds, causing fish to die from the smell.

The chosen responses in the dataset wrote vivid scenes, while the rejected refused, teaching the model to comply! (7/9)

3hViews 9.4KLikes 41Bookmarks 7
Tom McGrath@banburismus_

this was going to be the banner figure but apparently that's 'not professional'

Goodfire@GoodfireAI

Have you debugged your training data? You might not like what you find.

Introducing predictive data debugging: reveal and shape what your model will learn before training.

In DPO datasets, we found broken guardrails, hallucinations, and fish fart fan fiction (seriously). (1/9)

3hViews 2.6KLikes 50Bookmarks 5
Goodfire@GoodfireAI

Read the full blog post on predictive data debugging: https://www.goodfire.ai/research/predictive-data-debugging#

3hViews 1KLikes 32Bookmarks 8
Goodfire@GoodfireAI

Predictive data debugging reveals which behaviors DPO will amplify or suppress before you train (R² = 0.9 vs what the model actually learns).

It then traces behaviors to responsible data, and modulates learning to prevent undesired effects. (2/9)

3hViews 2KLikes 43Bookmarks 4
Goodfire@GoodfireAI

@jiaxinwen22 the full (73-page) paper is on arXiv! https://arxiv.org/abs/2606.12360

3hViews 702Likes 13Bookmarks 7
Christopher Potts@ChrisGPotts

An innovative use of interp tools to predict the effects of training! Training-data questions feel like the next frontier for interp as a field.

Goodfire@GoodfireAI

Have you debugged your training data? You might not like what you find.

Introducing predictive data debugging: reveal and shape what your model will learn before training.

In DPO datasets, we found broken guardrails, hallucinations, and fish fart fan fiction (seriously). (1/9)

1hViews 1.2KLikes 26Bookmarks 3
Goodfire@GoodfireAI

Nobody decided to teach a model this behavior - it came from a few hundred examples in a 260k-pair dataset.

You can't write an eval for a behavior you don't know exists!

Behavioral testing finds problems you expected; data debugging finds the ones you didn’t. (8/9)

3hViews 2.1KLikes 33Bookmarks 2
Goodfire@GoodfireAI

The key idea: interpreting a model also lets us interpret a dataset.

Passing data through an interpreted model reveals what the model computes when processing each example.

Those concepts predict what the model will move toward, or away from, if you train on that data. (3/9)

3hViews 1.1KLikes 32Bookmarks 1
Goodfire@GoodfireAI

Example #1: breaking safety guardrails

You might expect your "alignment" data to strengthen guardrails. If you use Dolci or Tulu, you’d be disappointed. DPO on these data made models more susceptible to jailbreaks.

But knowing what specific data caused it, we can fix it! (4/9)

3hViews 1.3KLikes 31Bookmarks 1
Goodfire@GoodfireAI

#3: sycophancy, but just for physics

This data makes models sycophantic, but only for pseudo-profound physics questions like "if the universe is governed by laws, where are they located?"

Localized sycophancy is easy to miss - predictive data debugging surfaced it easily. (6/9)

3hViews 1KLikes 30Bookmarks 1
Goodfire@GoodfireAI

If you train models on preference data, you have a curriculum you've never read. Predictive data debugging lets you read it, understand it, and rewrite it. We've built it into Silico, our platform for model design.

Request access to Silico here: https://www.goodfire.ai/silico (9/9)

3hViews 1.1KLikes 28Bookmarks 1
Goodfire@GoodfireAI

#2: hallucinated links

After DPO, the model produced links far more often when responding to prompts asking for resources on sensitive topics.

But the URLs were almost always hallucinated. The model had only learned to *look* helpful. (5/9)

3hViews 974Likes 31

Super excited about this work! This paper was driven by a claim I've been making to anyone who'll listen: "interpretability is the language of data". (1/3)

Goodfire@GoodfireAI

Have you debugged your training data? You might not like what you find.

Introducing predictive data debugging: reveal and shape what your model will learn before training.

In DPO datasets, we found broken guardrails, hallucinations, and fish fart fan fiction (seriously). (1/9)

3hViews 4.3KLikes 77Bookmarks 31

We use interp to understand models---but models are a compression of the data they were trained on, i.e., we can use models to *interpret data* and *make predictions* about what a training run, before even being launched, will teach a model!

3hViews 574Likes 19Bookmarks 1

This claim's been driving a lot of my recent thinking on interp, and I'm super excited to show interp has predictive affordances: e.g., we predict *extremely surprising things* a model will learn from a dataset and confirm these behaviorally!

3hViews 404Likes 15
elie@eliebakouch

very cool work, always nice to see new ways to "look at the data"

Goodfire@GoodfireAI

Have you debugged your training data? You might not like what you find.

Introducing predictive data debugging: reveal and shape what your model will learn before training.

In DPO datasets, we found broken guardrails, hallucinations, and fish fart fan fiction (seriously). (1/9)

17mViews 279Likes 10Bookmarks 1
Goodfire@GoodfireAI

@0xGTO 🧌

3hViews 273Likes 12Bookmarks 1
Load more posts