QUICK REVIEW

[论文解读] General Gaussian Noise Mechanisms and Their Optimality for Unbiased Mean Estimation

Aleksandar Nikolov, Haohua Tang|arXiv (Cornell University)|Jan 31, 2023

Privacy-Preserving Technologies in Data被引用 1

一句话总结

该论文证明了在高维空间中，对于无偏差分隐私均值估计，高斯噪声机制具有最优性。它推导出在 ℓp 范数误差下，高斯噪声的最优协方差矩阵，将先前关于对称多面体的研究推广至任意有界域，并证明在集中式差分隐私和近似差分隐私下，高斯机制几乎达到所有无偏私有估计器的最小误差。

ABSTRACT

We investigate unbiased high-dimensional mean estimators in differential privacy. We consider differentially private mechanisms whose expected output equals the mean of the input dataset, for every dataset drawn from a fixed bounded $d$-dimensional domain $K$. A classical approach to private mean estimation is to compute the true mean and add unbiased, but possibly correlated, Gaussian noise to it. In the first part of this paper, we study the optimal error achievable by a Gaussian noise mechanism for a given domain $K$ when the error is measured in the $\ell_p$ norm for some $p \ge 2$. We give algorithms that compute the optimal covariance for the Gaussian noise for a given $K$ under suitable assumptions, and prove a number of nice geometric properties of the optimal error. These results generalize the theory of factorization mechanisms from domains $K$ that are symmetric and finite (or, equivalently, symmetric polytopes) to arbitrary bounded domains. In the second part of the paper we show that Gaussian noise mechanisms achieve nearly optimal error among all private unbiased mean estimation mechanisms in a very strong sense. In particular, for every input dataset, an unbiased mean estimator satisfying concentrated differential privacy introduces approximately at least as much error as the best Gaussian noise mechanism. We extend this result to local differential privacy, and to approximate differential privacy, but for the latter the error lower bound holds either for a dataset or for a neighboring dataset, and this relaxation is necessary.

研究动机与目标

在高维空间中，对差分隐私下的无偏均值估计，刻画最优高斯噪声机制。
将有限对称域上的因子分解机制理论扩展至任意有界凸域。
建立高斯噪声机制在集中式和近似差分隐私下，对所有无偏私有估计器几乎达到最优误差的结论。
推导出张量积和边缘查询集合等结构化域的 Γp 范数的紧致界。

提出的方法

提出一种框架，用于在 ℓp 范数误差下，计算高维均值估计中高斯噪声的最优协方差矩阵。
使用几何与凸分析技术，刻画任意有界域 K ⊆ Rd 下最优噪声分布的特性。
应用对偶性与对称性论证，推导出特殊情形（如单位球的 ℓ-张量积）下最优噪声的闭式表达。
将 Γp 范数作为关键误差度量，将其与域 K 的几何特性相联系。
通过投影与坐标子空间，利用对称域的已知结果，证明误差的下界。
将结果扩展至局部差分隐私与近似差分隐私，表明下界在近似情况下对数据集或其邻近数据集均成立。

实验结果

研究问题

RQ1在任意有界域 K ⊆ Rd 和 ℓp 范数误差下，无偏均值估计的最优高斯噪声机制是什么？
RQ2最优误差如何随域 K 的几何结构变化，特别是对于张量积单位球等结构化域？
RQ3能否证明在集中式差分隐私下，高斯噪声机制是所有无偏私有估计器中的最优？
RQ4在差分隐私下发布 ℓ 重边缘查询的最小可实现误差是多少？
RQ5一般域的误差界与对称域或有限域的误差界有何关联？

主要发现

对于任意有界域 K ⊆ Rd 及 p ≥ 2，最优高斯噪声机制可最小化 ℓp 误差，其最优协方差通过几何优化推导得出。
Kℓd,∞（{−1,+1}d 的 ℓ-张量积）的 Γp 范数有界于 (d/ℓ)ℓ/p + ℓ/2，与该域的最优误差一致。
对于 ℓ 重边缘查询，查询集 Kmargd,ℓ 的 Γp 范数介于 dℓ/2 + ℓ/p 与 dℓ/p 之间，且具有紧致渐近界。
在集中式差分隐私下，高斯机制几乎达到最优误差，其下界在近似情况下对数据集或其邻近数据集均成立。
结果将有限对称域上的因子分解机制推广至任意有界域，统一了广泛的一类私有均值估计器。
证明了一般无偏机制的误差下界在常数因子范围内是紧致的，从而证明了高斯噪声机制的近似最优性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。