/Tech2h ago

Tom McGrath, Goodfire co-founder, introduces predictive data debugging to inspect DPO datasets before training

Story Overview

Goodfire is extending its Silico platform with a probe-based technique that forecasts how specific DPO training examples will shape downstream behaviors like safety compliance or hallucination rates, achieving roughly 0.9 correlation with actual post-training outcomes before any gradient steps occur.

494545524645.6K
Original post
Tom McGrath@banburismus_#1810inTech

goblins? in my training data? it's more likely than you think!

Goodfire@GoodfireAI

Have you debugged your training data? You might not like what you find.

Introducing predictive data debugging: reveal and shape what your model will learn before training.

In DPO datasets, we found broken guardrails, hallucinations, and fish fart fan fiction (seriously). (1/9)

10:07 AM · Jun 11, 2026 · 1.4K Views
Developer Impact

Probes surface hidden data problems early

The method flags broken guardrails, unexpected content, and other patterns inside preference datasets so teams can filter or edit examples ahead of time rather than retraining after issues appear.

Open Question

Wider rollout details stay under wraps

No public pricing, self-serve access, or third-party pipeline integration timelines have been shared yet, leaving open how quickly other labs could adopt the approach beyond current Silico users.

Sentiment

Users are excited about Goodfire AI's Predictive Data Debugging tool because it helps inspect training data quality upfront to improve model performance.

Pos
100.0%
Neg
0.0%
6 comments with sentiment.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS10.2KBOOKMARKS27LIKES51REPLIES6
Deedy@deedydas

The quality of your data directly dictates the quality of your AI model.

But the way data affects model performance is hand-wavy voodoo at worst and intuition at best.

This new research now lets you debug your data BEFORE you spend a fortune on an irreversible training run.

Goodfire@GoodfireAI

Have you debugged your training data? You might not like what you find.

Introducing predictive data debugging: reveal and shape what your model will learn before training.

In DPO datasets, we found broken guardrails, hallucinations, and fish fart fan fiction (seriously). (1/9)

2hViews 10.2KLikes 51Bookmarks 27
RETWEETS40
Goodfire@GoodfireAI

Have you debugged your training data? You might not like what you find.

Introducing predictive data debugging: reveal and shape what your model will learn before training.

In DPO datasets, we found broken guardrails, hallucinations, and fish fart fan fiction (seriously). (1/9)

2hViews 37.7KLikes 356Bookmarks 229
Goodfire@GoodfireAI

#4: fart fishing

Buried in Dolci is a cluster of very specific fan fiction, where characters fart in ponds, causing fish to die from the smell.

The chosen responses in the dataset wrote vivid scenes, while the rejected refused, teaching the model to comply! (7/9)

2hViews 9.4KLikes 41Bookmarks 7
Goodfire@GoodfireAI

Read the full blog post on predictive data debugging: https://www.goodfire.ai/research/predictive-data-debugging#

2hViews 1KLikes 32Bookmarks 8
Tom McGrath@banburismus_

this was going to be the banner figure but apparently that's 'not professional'

Goodfire@GoodfireAI

Have you debugged your training data? You might not like what you find.

Introducing predictive data debugging: reveal and shape what your model will learn before training.

In DPO datasets, we found broken guardrails, hallucinations, and fish fart fan fiction (seriously). (1/9)

1hViews 2KLikes 42Bookmarks 5
Goodfire@GoodfireAI

Predictive data debugging reveals which behaviors DPO will amplify or suppress before you train (R² = 0.9 vs what the model actually learns).

It then traces behaviors to responsible data, and modulates learning to prevent undesired effects. (2/9)

2hViews 2KLikes 43Bookmarks 4
Goodfire@GoodfireAI

@jiaxinwen22 the full (73-page) paper is on arXiv! https://arxiv.org/abs/2606.12360

1hViews 702Likes 13Bookmarks 7
Goodfire@GoodfireAI

Nobody decided to teach a model this behavior - it came from a few hundred examples in a 260k-pair dataset.

You can't write an eval for a behavior you don't know exists!

Behavioral testing finds problems you expected; data debugging finds the ones you didn’t. (8/9)

2hViews 2.1KLikes 33Bookmarks 2
Goodfire@GoodfireAI

The key idea: interpreting a model also lets us interpret a dataset.

Passing data through an interpreted model reveals what the model computes when processing each example.

Those concepts predict what the model will move toward, or away from, if you train on that data. (3/9)

2hViews 1.1KLikes 32Bookmarks 1
Goodfire@GoodfireAI

Example #1: breaking safety guardrails

You might expect your "alignment" data to strengthen guardrails. If you use Dolci or Tulu, you’d be disappointed. DPO on these data made models more susceptible to jailbreaks.

But knowing what specific data caused it, we can fix it! (4/9)

2hViews 1.3KLikes 31Bookmarks 1
Goodfire@GoodfireAI

#3: sycophancy, but just for physics

This data makes models sycophantic, but only for pseudo-profound physics questions like "if the universe is governed by laws, where are they located?"

Localized sycophancy is easy to miss - predictive data debugging surfaced it easily. (6/9)

2hViews 1KLikes 30Bookmarks 1
Goodfire@GoodfireAI

If you train models on preference data, you have a curriculum you've never read. Predictive data debugging lets you read it, understand it, and rewrite it. We've built it into Silico, our platform for model design.

Request access to Silico here: https://www.goodfire.ai/silico (9/9)

2hViews 1.1KLikes 28Bookmarks 1
Goodfire@GoodfireAI

#2: hallucinated links

After DPO, the model produced links far more often when responding to prompts asking for resources on sensitive topics.

But the URLs were almost always hallucinated. The model had only learned to *look* helpful. (5/9)

2hViews 974Likes 31
Goodfire@GoodfireAI

@0xGTO 🧌

1hViews 273Likes 12Bookmarks 1
Christopher Potts@ChrisGPotts

An innovative use of interp tools to predict the effects of training! Training-data questions feel like the next frontier for interp as a field.

Goodfire@GoodfireAI

Have you debugged your training data? You might not like what you find.

Introducing predictive data debugging: reveal and shape what your model will learn before training.

In DPO datasets, we found broken guardrails, hallucinations, and fish fart fan fiction (seriously). (1/9)

28mViews 242Likes 8Bookmarks 0
GOBLIN@GoblinSPL

@GoodfireAI goblin spotted👀

1hViews 144Likes 9
Goodfire@GoodfireAI

@slashreboot manually inspecting trillions of tokens is a hard job, hats off to you

2hViews 173Likes 4
Jiaxin Wen@jiaxinwen22

@GoodfireAI will you release a tech report?

1hViews 283Likes 3
Load more posts