[论文解读] The Cost of Privacy: Optimal Rates of Convergence for Parameter Estimation with Differential Privacy
本文建立了极小化极大下界,并将其与用于均值估计和线性回归的差分隐私算法相匹配,刻画了低维和高维设置下隐私(epsilon, delta)与统计精度之间的最优权衡。
Privacy-preserving data analysis is a rising challenge in contemporary statistics, as the privacy guarantees of statistical methods are often achieved at the expense of accuracy. In this paper, we investigate the tradeoff between statistical accuracy and privacy in mean estimation and linear regression, under both the classical low-dimensional and modern high-dimensional settings. A primary focus is to establish minimax optimality for statistical estimation with the $(\varepsilon,δ)$-differential privacy constraint. To this end, we find that classical lower bound arguments fail to yield sharp results, and new technical tools are called for. By refining the "tracing adversary" technique for lower bounds in the theoretical computer science literature, we formulate a general lower bound argument for minimax risks with differential privacy constraints, and apply this argument to high-dimensional mean estimation and linear regression problems. We also design computationally efficient algorithms that attain the minimax lower bounds up to a logarithmic factor. In particular, for the high-dimensional linear regression, a novel private iterative hard thresholding pursuit algorithm is proposed, based on a privately truncated version of stochastic gradient descent. The numerical performance of these algorithms is demonstrated by simulation studies and applications to real data containing sensitive information, for which privacy-preserving statistical methods are necessary.
研究动机与目标
- 在基本估计问题中,动机化并形式化在 (epsilon, delta)-差分隐私下隐私成本的表达。
- 推导在隐私约束下的均值估计和线性回归的极小化极大下界。
- 设计差分隐私算法,在对数因子之内达到这些下界。
- 提供低维和高维设置下隐私-精度权衡的理论与经验验证。
提出的方法
- 通过在均值估计和线性回归中对极小化极大风险来定义隐私成本(在 epsilon, delta-DP 下)。
- 精细化跟踪对手技术,以获得低维和高维问题的尖锐 DP 下界。
- 构造达到下界的 DP 算法(高斯/带噪机制、私有迭代方法),在对数因子内达到下界。
- 引入一种使用剥离机制私有选择坐标的私有稀疏均值估计方法。
- 分析收敛速率并导出在适当情形下的界限,如 tilde{O}((d^2 log(1/delta))/ (n^2 epsilon^2))。
实验结果
研究问题
- RQ1在低维和高维设置下,估计均值向量和回归系数的极小化极大风险在 (epsilon, delta)-DP 下是多少?
- RQ2差分隐私算法能否达到相应的极小化极大下界,建立收敛速率的最优性?
- RQ3稀疏性在高维均值估计和回归中的隐私成本有何影响?
- RQ4有哪些实用的私有算法能达到这些最优速率,且在经验中表现如何?
- RQ5在 n、d、s* 的不同情形下,隐私成本与经典统计风险的比较如何?
主要发现
- 隐私成本在 (d log(1/delta))/ (n epsilon^2) 较大时主导统计风险,均值估计的界限如 Omega(d/n + d^2 log(1/delta)/ (n^2 epsilon^2))。
- 在 DP 下的高维均值估计和线性回归的新下界,速率包含 (s log d)^2 / (n^2 epsilon^2)。
- 提出达到下界的 DP 算法,包含用于回归的带噪梯度下降,其收敛为 tilde{O}(d^2 log(1/delta)/(n^2 epsilon^2)),以及一个私有剥离基的稀疏均值估计器,达到稀疏下界。
- 对于高维稀疏估计,DP 速率依赖于 (s log d)^2,并在对数因子意义上近似最优,显示 DP 在高维情形的可行性。
- 提供数值仿真和真实数据应用,说明隐私-精度权衡。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。