/AI3h ago

Waterloo's Gautam Kamath says differentially private data synthesis causes downstream question-answering accuracy to collapse

Geminon accuracy dropped to 4 even at epsilon=100.

210063

Comments

#210

Original post

Gautam Kamath@thegautamkamath#210inAI

These tasks are hard/impossible to zero-shot, rather easy without privacy, but surprisingly hard even with large privacy budgets (ε = 100)!

This room to grow means we can really measure progress made by new DP synthetic data benchmarks. 6/n

Gautam Kamath@thegautamkamath

E.g., the Geminon task is fully synthetic. Fake creatures are randomly generated, and articles are written including their attributes. QA tasks are questions about these attributes.

News uses real data: QA tasks about news articles since the last ContinuousBench release 5/n

6:46 AM · Jun 1, 2026 · 55 Views

/AI3h ago

Waterloo's Gautam Kamath says differentially private data synthesis causes downstream question-answering accuracy to collapse

Geminon accuracy dropped to 4 even at epsilon=100.

--0--

Comments

#210

Original post

Gautam Kamath@thegautamkamath#210inAI

These tasks are hard/impossible to zero-shot, rather easy without privacy, but surprisingly hard even with large privacy budgets (ε = 100)!

This room to grow means we can really measure progress made by new DP synthetic data benchmarks. 6/n

Gautam Kamath@thegautamkamath

E.g., the Geminon task is fully synthetic. Fake creatures are randomly generated, and articles are written including their attributes. QA tasks are questions about these attributes.

News uses real data: QA tasks about news articles since the last ContinuousBench release 5/n

6:46 AM · Jun 1, 2026 · 55 Views

Sentiment

Users are excited about new benchmarks showing differential privacy limits on synthetic data QA accuracy because the work addresses prior research calls and is viewed as cool and valuable.

Pos

100.0%

Neg

0.0%

3 comments with sentiment.

Cluster Engagement

Sentiment

Sentiment unavailable for this story.

Cluster Engagement

Views

Comments

Reposts

Bookmarks

Expand data

Posts from X

Most Activity

Niloofar@niloofar_mire

@thegautamkamath This so cool and exciting!!!! Whats the tldr do we get more style or more substance transfer normally?

Gautam Kamath@thegautamkamath

These tasks are hard/impossible to zero-shot, rather easy without privacy, but surprisingly hard even with large privacy budgets (ε = 100)!

This room to grow means we can really measure progress made by new DP synthetic data benchmarks. 6/n

1h1400