Can frontier models reasoning about Differential Privacy?
We are excited to release DPrivBench, a benchmark curated by DP experts to evaluate LLMs’ reasoning ability on Differential Privacy. @pengrun_huang @chien_eli @omthkkr @kamalikac @yuxiangw_cs @ruihan_w
DPrivBench contains two complementary tracks:
Category 1 (preliminary): fundamental DP mechanism questions focused on sensitivity calculation and noise calibration.
Category 2 (advanced): research-level DP algorithms from the literature that require advanced, algorithm-specific mathematical reasoning.
Our findings are both promising and cautionary: frontier models are strong on Category 1, yet still struggle with research-level DP algorithms requiring nuanced reasoning about assumptions, privacy accounting, and algorithm-specific guarantees.