/Tech3h ago

Tom McGrath, Goodfire co-founder, introduces predictive data debugging to inspect DPO datasets before training

Story Overview

Goodfire is extending its Silico platform with a probe-based technique that forecasts how specific DPO training examples will shape downstream behaviors like safety compliance or hallucination rates, achieving roughly 0.9 correlation with actual post-training outcomes before any gradient steps occur.

646897737671.5K

#235

Original post

Tom McGrath@banburismus_#1810inTech

goblins? in my training data? it's more likely than you think!

Goodfire@GoodfireAI

Have you debugged your training data? You might not like what you find.

Introducing predictive data debugging: reveal and shape what your model will learn before training.

In DPO datasets, we found broken guardrails, hallucinations, and fish fart fan fiction (seriously). (1/9)

10:07 AM · Jun 11, 2026 · 1.9K Views

/Tech3h ago

Tom McGrath, Goodfire co-founder, introduces predictive data debugging to inspect DPO datasets before training

Story Overview

646897737671.5K

#235

Original post

Tom McGrath@banburismus_#1810inTech

goblins? in my training data? it's more likely than you think!

Goodfire@GoodfireAI

Have you debugged your training data? You might not like what you find.

Introducing predictive data debugging: reveal and shape what your model will learn before training.

In DPO datasets, we found broken guardrails, hallucinations, and fish fart fan fiction (seriously). (1/9)

10:07 AM · Jun 11, 2026 · 1.9K Views

Developer Impact

Probes surface hidden data problems early

The method flags broken guardrails, unexpected content, and other patterns inside preference datasets so teams can filter or edit examples ahead of time rather than retraining after issues appear.

Open Question

Wider rollout details stay under wraps

No public pricing, self-serve access, or third-party pipeline integration timelines have been shared yet, leaving open how quickly other labs could adopt the approach beyond current Silico users.

Sentiment

Many users praise Goodfire AI's predictive data debugging tool because it addresses the critical but often-ignored impact of training data quality on model performance, with some sharing past frustrations from bad data.

Pos

100.0%

Neg

0.0%

9 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

VIEWS12.4KBOOKMARKS38LIKES59REPLIES6

Deedy@deedydas

The quality of your data directly dictates the quality of your AI model.

But the way data affects model performance is hand-wavy voodoo at worst and intuition at best.

This new research now lets you debug your data BEFORE you spend a fortune on an irreversible training run.

Goodfire@GoodfireAI

Have you debugged your training data? You might not like what you find.

Introducing predictive data debugging: reveal and shape what your model will learn before training.

In DPO datasets, we found broken guardrails, hallucinations, and fish fart fan fiction (seriously). (1/9)

3h12.4K5938

RETWEETS40

Goodfire@GoodfireAI

Have you debugged your training data? You might not like what you find.

Introducing predictive data debugging: reveal and shape what your model will learn before training.

In DPO datasets, we found broken guardrails, hallucinations, and fish fart fan fiction (seriously). (1/9)

3h53.1K447312

Goodfire@GoodfireAI

#4: fart fishing

Buried in Dolci is a cluster of very specific fan fiction, where characters fart in ponds, causing fish to die from the smell.

The chosen responses in the dataset wrote vivid scenes, while the rejected refused, teaching the model to comply! (7/9)

3h9.4K417

Tom McGrath@banburismus_

this was going to be the banner figure but apparently that's 'not professional'

Goodfire@GoodfireAI

Have you debugged your training data? You might not like what you find.

Introducing predictive data debugging: reveal and shape what your model will learn before training.

In DPO datasets, we found broken guardrails, hallucinations, and fish fart fan fiction (seriously). (1/9)

3h2.6K505

Goodfire@GoodfireAI

Read the full blog post on predictive data debugging: https://www.goodfire.ai/research/predictive-data-debugging#

3h1K328

Goodfire@GoodfireAI

Predictive data debugging reveals which behaviors DPO will amplify or suppress before you train (R² = 0.9 vs what the model actually learns).

It then traces behaviors to responsible data, and modulates learning to prevent undesired effects. (2/9)

3h2K434

Goodfire@GoodfireAI

@jiaxinwen22 the full (73-page) paper is on arXiv! https://arxiv.org/abs/2606.12360

3h702137

Christopher Potts@ChrisGPotts

An innovative use of interp tools to predict the effects of training! Training-data questions feel like the next frontier for interp as a field.

Goodfire@GoodfireAI

Have you debugged your training data? You might not like what you find.

Introducing predictive data debugging: reveal and shape what your model will learn before training.

In DPO datasets, we found broken guardrails, hallucinations, and fish fart fan fiction (seriously). (1/9)

1h1.2K263

Goodfire@GoodfireAI

Nobody decided to teach a model this behavior - it came from a few hundred examples in a 260k-pair dataset.

You can't write an eval for a behavior you don't know exists!

Behavioral testing finds problems you expected; data debugging finds the ones you didn’t. (8/9)

3h2.1K332

Goodfire@GoodfireAI

The key idea: interpreting a model also lets us interpret a dataset.

Passing data through an interpreted model reveals what the model computes when processing each example.

Those concepts predict what the model will move toward, or away from, if you train on that data. (3/9)

3h1.1K321

Goodfire@GoodfireAI

Example #1: breaking safety guardrails

You might expect your "alignment" data to strengthen guardrails. If you use Dolci or Tulu, you’d be disappointed. DPO on these data made models more susceptible to jailbreaks.

But knowing what specific data caused it, we can fix it! (4/9)

3h1.3K311

Goodfire@GoodfireAI

#3: sycophancy, but just for physics

This data makes models sycophantic, but only for pseudo-profound physics questions like "if the universe is governed by laws, where are they located?"

Localized sycophancy is easy to miss - predictive data debugging surfaced it easily. (6/9)

3h1K301

Goodfire@GoodfireAI

If you train models on preference data, you have a curriculum you've never read. Predictive data debugging lets you read it, understand it, and rewrite it. We've built it into Silico, our platform for model design.

Request access to Silico here: https://www.goodfire.ai/silico (9/9)

3h1.1K281

Goodfire@GoodfireAI

#2: hallucinated links

After DPO, the model produced links far more often when responding to prompts asking for resources on sensitive topics.

But the URLs were almost always hallucinated. The model had only learned to *look* helpful. (5/9)

3h97431

Ekdeep Singh Lubana@EkdeepL

Super excited about this work! This paper was driven by a claim I've been making to anyone who'll listen: "interpretability is the language of data". (1/3)

Goodfire@GoodfireAI

Have you debugged your training data? You might not like what you find.

Introducing predictive data debugging: reveal and shape what your model will learn before training.

In DPO datasets, we found broken guardrails, hallucinations, and fish fart fan fiction (seriously). (1/9)

3h4.3K7731

Ekdeep Singh Lubana@EkdeepL

We use interp to understand models---but models are a compression of the data they were trained on, i.e., we can use models to *interpret data* and *make predictions* about what a training run, before even being launched, will teach a model!

3h574191

Ekdeep Singh Lubana@EkdeepL

This claim's been driving a lot of my recent thinking on interp, and I'm super excited to show interp has predictive affordances: e.g., we predict *extremely surprising things* a model will learn from a dataset and confirm these behaviorally!

3h40415

elie@eliebakouch

very cool work, always nice to see new ways to "look at the data"

Goodfire@GoodfireAI

Have you debugged your training data? You might not like what you find.

Introducing predictive data debugging: reveal and shape what your model will learn before training.

In DPO datasets, we found broken guardrails, hallucinations, and fish fart fan fiction (seriously). (1/9)

17m279101

Goodfire@GoodfireAI

@0xGTO 🧌

3h273121

GTO ◅(•-•)▻@0xGTO

@GoodfireAI i see goblins!

3h15971