/Tech2h ago

Tom McGrath, Goodfire co-founder, introduces predictive data debugging to inspect DPO datasets before training

Story Overview

Goodfire is extending its Silico platform with a probe-based technique that forecasts how specific DPO training examples will shape downstream behaviors like safety compliance or hallucination rates, achieving roughly 0.9 correlation with actual post-training outcomes before any gradient steps occur.

494545524645.6K

#235

Original post

Tom McGrath@banburismus_#1810inTech

goblins? in my training data? it's more likely than you think!

Goodfire@GoodfireAI

Have you debugged your training data? You might not like what you find.

Introducing predictive data debugging: reveal and shape what your model will learn before training.

In DPO datasets, we found broken guardrails, hallucinations, and fish fart fan fiction (seriously). (1/9)

10:07 AM · Jun 11, 2026 · 1.4K Views

/Tech2h ago

Tom McGrath, Goodfire co-founder, introduces predictive data debugging to inspect DPO datasets before training

Story Overview

494545524645.6K

#235

Original post

Tom McGrath@banburismus_#1810inTech

goblins? in my training data? it's more likely than you think!

Goodfire@GoodfireAI

Have you debugged your training data? You might not like what you find.

Introducing predictive data debugging: reveal and shape what your model will learn before training.

In DPO datasets, we found broken guardrails, hallucinations, and fish fart fan fiction (seriously). (1/9)

10:07 AM · Jun 11, 2026 · 1.4K Views

Developer Impact

Probes surface hidden data problems early

The method flags broken guardrails, unexpected content, and other patterns inside preference datasets so teams can filter or edit examples ahead of time rather than retraining after issues appear.

Open Question

Wider rollout details stay under wraps

No public pricing, self-serve access, or third-party pipeline integration timelines have been shared yet, leaving open how quickly other labs could adopt the approach beyond current Silico users.

Sentiment

Users are excited about Goodfire AI's Predictive Data Debugging tool because it helps inspect training data quality upfront to improve model performance.

Pos

100.0%

Neg

0.0%

6 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

VIEWS10.2KBOOKMARKS27LIKES51REPLIES6

Deedy@deedydas

The quality of your data directly dictates the quality of your AI model.

But the way data affects model performance is hand-wavy voodoo at worst and intuition at best.

This new research now lets you debug your data BEFORE you spend a fortune on an irreversible training run.

Goodfire@GoodfireAI

Have you debugged your training data? You might not like what you find.

Introducing predictive data debugging: reveal and shape what your model will learn before training.

In DPO datasets, we found broken guardrails, hallucinations, and fish fart fan fiction (seriously). (1/9)

2h10.2K5127

RETWEETS40

Goodfire@GoodfireAI

Have you debugged your training data? You might not like what you find.

Introducing predictive data debugging: reveal and shape what your model will learn before training.

In DPO datasets, we found broken guardrails, hallucinations, and fish fart fan fiction (seriously). (1/9)

2h37.7K356229

Goodfire@GoodfireAI

#4: fart fishing

Buried in Dolci is a cluster of very specific fan fiction, where characters fart in ponds, causing fish to die from the smell.

The chosen responses in the dataset wrote vivid scenes, while the rejected refused, teaching the model to comply! (7/9)

2h9.4K417

Goodfire@GoodfireAI

Read the full blog post on predictive data debugging: https://www.goodfire.ai/research/predictive-data-debugging#

2h1K328

Tom McGrath@banburismus_

this was going to be the banner figure but apparently that's 'not professional'

Goodfire@GoodfireAI

Have you debugged your training data? You might not like what you find.

Introducing predictive data debugging: reveal and shape what your model will learn before training.

In DPO datasets, we found broken guardrails, hallucinations, and fish fart fan fiction (seriously). (1/9)

1h2K425

Goodfire@GoodfireAI

Predictive data debugging reveals which behaviors DPO will amplify or suppress before you train (R² = 0.9 vs what the model actually learns).

It then traces behaviors to responsible data, and modulates learning to prevent undesired effects. (2/9)

2h2K434

Goodfire@GoodfireAI

@jiaxinwen22 the full (73-page) paper is on arXiv! https://arxiv.org/abs/2606.12360

1h702137

Goodfire@GoodfireAI

Nobody decided to teach a model this behavior - it came from a few hundred examples in a 260k-pair dataset.

You can't write an eval for a behavior you don't know exists!

Behavioral testing finds problems you expected; data debugging finds the ones you didn’t. (8/9)

2h2.1K332

Goodfire@GoodfireAI

The key idea: interpreting a model also lets us interpret a dataset.

Passing data through an interpreted model reveals what the model computes when processing each example.

Those concepts predict what the model will move toward, or away from, if you train on that data. (3/9)

2h1.1K321

Goodfire@GoodfireAI

Example #1: breaking safety guardrails

You might expect your "alignment" data to strengthen guardrails. If you use Dolci or Tulu, you’d be disappointed. DPO on these data made models more susceptible to jailbreaks.

But knowing what specific data caused it, we can fix it! (4/9)

2h1.3K311

Goodfire@GoodfireAI

#3: sycophancy, but just for physics

This data makes models sycophantic, but only for pseudo-profound physics questions like "if the universe is governed by laws, where are they located?"

Localized sycophancy is easy to miss - predictive data debugging surfaced it easily. (6/9)

2h1K301

Goodfire@GoodfireAI

If you train models on preference data, you have a curriculum you've never read. Predictive data debugging lets you read it, understand it, and rewrite it. We've built it into Silico, our platform for model design.

Request access to Silico here: https://www.goodfire.ai/silico (9/9)

2h1.1K281

Goodfire@GoodfireAI

#2: hallucinated links

After DPO, the model produced links far more often when responding to prompts asking for resources on sensitive topics.

But the URLs were almost always hallucinated. The model had only learned to *look* helpful. (5/9)

2h97431

Goodfire@GoodfireAI

@0xGTO 🧌

1h273121