QUICK REVIEW

[论文解读] The Geometry of Differential Privacy: the Sparse and Approximate Cases

Aleksandar Nikolov, Kunal Talwar|arXiv (Cornell University)|Dec 3, 2012

Privacy-Preserving Technologies in Data参考文献 68被引用 98

一句话总结

本文提出了一种使用相关高斯噪声的 $(\varepsilon,\delta)$-差分隐私线性查询发布机制，实现了 $O(\log^2 d)$ 的近似比，达到近似最优的准确度。此外，针对 $d > n$ 的稀疏数据库，该机制结合高斯噪声与 $\ell_1$-正则化回归，实现了对数多边形近似，将计数查询的误差界改进至 $\tilde{O}(\sqrt{n})$ 每查询。

ABSTRACT

In this work, we study trade-offs between accuracy and privacy in the context of linear queries over histograms. This is a rich class of queries that includes contingency tables and range queries, and has been a focus of a long line of work. For a set of $d$ linear queries over a database $x \in \R^N$, we seek to find the differentially private mechanism that has the minimum mean squared error. For pure differential privacy, an $O(\log^2 d)$ approximation to the optimal mechanism is known. Our first contribution is to give an $O(\log^2 d)$ approximation guarantee for the case of $(\eps,δ)$-differential privacy. Our mechanism is simple, efficient and adds correlated Gaussian noise to the answers. We prove its approximation guarantee relative to the hereditary discrepancy lower bound of Muthukrishnan and Nikolov, using tools from convex geometry. We next consider this question in the case when the number of queries exceeds the number of individuals in the database, i.e. when $d > n riangleq \|x\|_1$. It is known that better mechanisms exist in this setting. Our second main contribution is to give an $(\eps,δ)$-differentially private mechanism which is optimal up to a $\polylog(d,N)$ factor for any given query set $A$ and any given upper bound $n$ on $\|x\|_1$. This approximation is achieved by coupling the Gaussian noise addition approach with a linear regression step. We give an analogous result for the $\eps$-differential privacy setting. We also improve on the mean squared error upper bound for answering counting queries on a database of size $n$ by Blum, Ligett, and Roth, and match the lower bound implied by the work of Dinur and Nissim up to logarithmic factors. The connection between hereditary discrepancy and the privacy mechanism enables us to derive the first polylogarithmic approximation to the hereditary discrepancy of a matrix $A$.

研究动机与目标

填补在直方图上针对线性查询的 $(\varepsilon,\delta)$-差分隐私机制中近似保证的空白。
解决当查询数量 $d$ 超过个体数量 $n$（即稀疏情形）时的查询准确度挑战，此时标准下界不再适用。
改进差分隐私下计数查询的误差界，特别是在 $d > n$ 的稀疏情形下。
建立遗传歧离度与差分隐私机制之间的联系，实现对歧离度的对数多边形近似。
设计高效且简洁的机制，在纯差分隐私与近似差分隐私下均实现近似最优误差。

提出的方法

通过添加相关高斯噪声，构建一个 $(\varepsilon,\delta)$-差分隐私机制，其近似比为 $O(\log^2 d)$，接近最优机制。
利用凸几何工具，证明该机制相对于 [MN12] 所提出的遗传歧离度下界具有近似保证。
在稀疏情形下，通过在 $\ell_1$-球上结合高斯噪声添加与 $\ell_1$-正则化回归，实现误差在 $\operatorname{polylog}(d,N)$ 范围内接近最优。
采用基于采样与截断的构造性方法求解 SDP 可行性问题，确保经验估计在常数因子内满足约束。
利用差分隐私蕴含逆查询响应方差有界的事实，推导出机制输出协方差矩阵的下界。
通过组合从查询子问题中导出的半正定矩阵，采用迭代精炼方法构造半定规划（SDP）的可行解。

实验结果

研究问题

RQ1在一般线性查询情形下，能否实现 $(\varepsilon,\delta)$-差分隐私机制的对数多边形近似比？
RQ2当 $d > n$ 时，准确度与隐私之间的最优权衡是什么？能否设计出在该稀疏情形下优于现有边界的机制？
RQ3遗传歧离度下界能否用于推导差分隐私机制的近似保证？
RQ4能否在差分隐私下将计数查询的误差界改进至超过先前工作 $\tilde{O}(n^{2/3})$ 的界限？
RQ5在稀疏设置下，能否构造出 $\ell_2^2$ 误差在 $\operatorname{polylog}(d,N)$ 范围内接近最优的差分隐私机制？

主要发现

所提出的 $(\varepsilon,\delta)$-差分隐私机制对最优机制实现了 $O(\log^2 d)$ 的近似比，与纯差分隐私下的最佳已知界一致。
在 $d > n$ 的稀疏情形下，该机制实现了均方误差在 $\operatorname{polylog}(d,N)$ 范围内接近最优，显著优于先前的界限。
对于任意计数查询，该机制实现了每查询 $\tilde{O}(\sqrt{n})$ 的期望误差，优于 [BLR08] 的 $\tilde{O}(n^{2/3})$ 界限，并与 [DN03] 的下界在对数因子内一致。
本文首次通过与差分隐私机制的联系，实现了对矩阵 $A$ 的遗传歧离度的对数多边形近似。
该机制简单高效，依赖于相关高斯噪声与 $\ell_1$-正则化回归，其分析基于构造性 SDP 方法。
分析表明，差分隐私蕴含对逆查询响应方差的下界，从而支持半定规划可行解的构造。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。