QUICK REVIEW

[论文解读] On the Generalization Properties of Differential Privacy

Kobbi Nissim, Uri Stemmer|arXiv (Cornell University)|Apr 22, 2015

Privacy-Preserving Technologies in Data参考文献 5被引用 24

一句话总结

本文简化并改进了自适应统计查询（ASQ）环境下差分隐私算法的泛化界。它表明，$(\varepsilon,\delta)$-差分隐私机制以概率 $1 - O(\delta \log(1/\varepsilon)/\varepsilon)$ 保证 $O(\varepsilon)$ 的准确度，证明该界在对数因子范围内是紧致的，并推翻了直观但错误的猜想——即准确度应以概率 $1 - O(\delta)$ 成立。该结果加强了具有多项式对数样本复杂度的私有查询回答的理论基础。

ABSTRACT

A new line of work, started with Dwork et al., studies the task of answering statistical queries using a sample and relates the problem to the concept of differential privacy. By the Hoeffding bound, a sample of size $O(\log k/α^2)$ suffices to answer $k$ non-adaptive queries within error $α$, where the answers are computed by evaluating the statistical queries on the sample. This argument fails when the queries are chosen adaptively (and can hence depend on the sample). Dwork et al. showed that if the answers are computed with $(ε,δ)$-differential privacy then $O(ε)$ accuracy is guaranteed with probability $1-O(δ^ε)$. Using the Private Multiplicative Weights mechanism, they concluded that the sample size can still grow polylogarithmically with the $k$. Very recently, Bassily et al. presented an improved bound and showed that (a variant of) the private multiplicative weights algorithm can answer $k$ adaptively chosen statistical queries using sample complexity that grows logarithmically in $k$. However, their results no longer hold for every differentially private algorithm, and require modifying the private multiplicative weights algorithm in order to obtain their high probability bounds. We greatly simplify the results of Dwork et al. and improve on the bound by showing that differential privacy guarantees $O(ε)$ accuracy with probability $1-O(δ\log(1/ε)/ε)$. It would be tempting to guess that an $(ε,δ)$-differentially private computation should guarantee $O(ε)$ accuracy with probability $1-O(δ)$. However, we show that this is not the case, and that our bound is tight (up to logarithmic factors).

研究动机与目标

简化并加强差分隐私在自适应统计查询背景下的理论分析。
解决关于 $(\varepsilon,\delta)$-差分隐私是否保证 $O(\varepsilon)$ 准确度以概率 $1 - O(\delta)$ 成立的开放问题，该问题被证明为错误。
在自适应查询工作负载下，为差分隐私机制建立紧致的泛化界，改进先前在样本复杂度和概率保证方面的结果。

提出的方法

本文基于隐私参数 $\varepsilon$ 和 $\delta$，采用改进的集中极限论证，重新分析差分隐私机制的泛化特性。
应用一种新颖的耦合技术，以限制私有答案与真实查询值之间的偏差，从而获得更紧致的高概率误差界。
利用自适应查询工作负载的结构以及差分隐私带来的稳定性，推导出失败概率与 $\delta \log(1/\varepsilon)/\varepsilon$ 成正比，而非 $\delta$。
通过构造反例证明该界无法改进为 $1 - O(\delta)$，表明失败概率必须依赖于 $\log(1/\varepsilon)$。
该方法不依赖于特定算法（如私有乘法权重），而是聚焦于差分隐私的一般性质。
提供了一条简洁、自包含的证明，简化了先前的分析，同时实现了更强且更精确的界。

实验结果

研究问题

RQ1在自适应查询设置下，差分隐私机制的泛化误差能否以高概率 $1 - O(\delta)$ 实现 $O(\varepsilon)$ 的准确度？
RQ2以概率 $1 - O(\delta \log(1/\varepsilon)/\varepsilon)$ 实现 $O(\varepsilon)$ 准确度的界是否在对数因子范围内是紧致的？
RQ3为何直观猜想——$O(\varepsilon)$ 准确度以概率 $1 - O(\delta)$ 成立——在自适应查询设置中不成立？
RQ4如何在不损失紧致性的情况下，简化差分隐私泛化特性分析？
RQ5在自适应统计查询工作负载中，失败概率对 $\varepsilon$ 和 $\delta$ 的正确依赖关系是什么？

主要发现

本文确立了 $(\varepsilon,\delta)$-差分隐私机制以概率 $1 - O(\delta \log(1/\varepsilon)/\varepsilon)$ 保证 $O(\varepsilon)$ 的准确度，相比先前界有显著改进。
该界被证明在对数因子范围内是紧致的，意味着一般情况下无法获得明显更优的失败概率。
直观猜想——准确度以概率 $1 - O(\delta)$ 成立——被证明为错误，因为失败概率必须依赖于 $\log(1/\varepsilon)$。
该分析简化并加强了 Dwork 等人以及 Bassily 等人的先前结果，提供了一条更清晰的证明，且不依赖于算法特异性修改。
该结果对所有差分隐私算法普遍成立，不仅限于私有乘法权重等专用机制。
本文证明，即使在改进的高概率界下，回答 $k$ 个自适应查询的样本复杂度仍保持对 $k$ 的多项式对数复杂度。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。