These tasks are hard/impossible to zero-shot, rather easy without privacy, but surprisingly hard even with large privacy budgets (ε = 100)!
This room to grow means we can really measure progress made by new DP synthetic data benchmarks. 6/n
E.g., the Geminon task is fully synthetic. Fake creatures are randomly generated, and articles are written including their attributes. QA tasks are questions about these attributes.
News uses real data: QA tasks about news articles since the last ContinuousBench release 5/n