QUICK REVIEW

[论文解读] Differentially Private Empirical Risk Minimization: Efficient Algorithms and Tight Error Bounds

Raef Bassily, Adam Smith|arXiv (Cornell University)|May 27, 2014

Privacy-Preserving Technologies in Data参考文献 20被引用 60

一句话总结

本论文在最小假设条件下——Lipschitz损失函数与有界优化域——提出了高效的差分隐私凸经验风险最小化（ERM）算法，实现了最优的多余风险界。论文针对 $(\epsilon,0)$- 和 $(\epsilon,\delta)$- 差分隐私分别提出了独特且最优的技术：前者采用指数采样，后者依赖于带定位的梯度下降，两者均在多项式时间内运行，并在某些情况下与非私有方法的Oracle复杂度相匹配。

ABSTRACT

In this paper, we initiate a systematic investigation of differentially private algorithms for convex empirical risk minimization. Various instantiations of this problem have been studied before. We provide new algorithms and matching lower bounds for private ERM assuming only that each data point's contribution to the loss function is Lipschitz bounded and that the domain of optimization is bounded. We provide a separate set of algorithms and matching lower bounds for the setting in which the loss functions are known to also be strongly convex. Our algorithms run in polynomial time, and in some cases even match the optimal non-private running time (as measured by oracle complexity). We give separate algorithms (and lower bounds) for $(ε,0)$- and $(ε,δ)$-differential privacy; perhaps surprisingly, the techniques used for designing optimal algorithms in the two cases are completely different. Our lower bounds apply even to very simple, smooth function families, such as linear and quadratic functions. This implies that algorithms from previous work can be used to obtain optimal error rates, under the additional assumption that the contributions of each data point to the loss function is smooth. We show that simple approaches to smoothing arbitrary loss functions (in order to apply previous techniques) do not yield optimal error rates. In particular, optimal algorithms were not previously known for problems such as training support vector machines and the high-dimensional median.

研究动机与目标

在最小假设下设计高效的差分隐私凸经验风险最小化算法：Lipschitz损失与有界域。
为 $(\epsilon,0)$- 和 $(\epsilon,\delta)$- 差分隐私建立多余风险的匹配下界。
设计在保持多项式时间复杂度的同时实现最优多余风险的算法，在某些情况下与非私有方法的Oracle复杂度相匹配。
解决非光滑问题（如SVM和高维中位数）中先前平滑技术失效时的最优误差率差距。

提出的方法

对于 $(\epsilon,\delta)$-隐私，论文采用一种带惩罚项的局部化梯度下降方法，将凸体上的采样问题转化为立方体上的采样，从而实现高效的对数凹采样。
对于 $(\epsilon,0)$-隐私，采用通过高效对数凹采样实现的指数机制，利用缩放后的损失函数以确保隐私。
算法基于在凸集上对对数凹分布的高效采样，利用了最近在等距变换与马尔可夫链蒙特卡洛方法方面的进展。
通过真实与近似采样分布之间的距离界建立隐私保证，采用一种带有校准敏感度参数的指数机制变体。
损失函数通过 $\frac{\epsilon}{6L\|\mathcal{C}\|_2}$ 缩放以确保 $\epsilon$-差分隐私，输出从与 $\exp\left(-\frac{\epsilon}{6L\|\mathcal{C}\|_2}\mathcal{L}(\theta;\mathcal{D})\right)$ 成比例的分布中采样。
该方法包括向等距位置的约化，并使用惩罚函数以保持凸性并确保高效采样。

实验结果

研究问题

RQ1在仅假设Lipschitz与有界域的条件下，差分隐私凸ERM可实现的最优多余风险是多少？
RQ2能否为 $(\epsilon,0)$- 和 $(\epsilon,\delta)$- 差分隐私设计出具有匹配下界的高效多项式时间算法？
RQ3为何标准平滑技术无法在非光滑损失函数（如合页损失或中位数）上实现最优误差率？
RQ4如何利用高效的对数凹采样来设计最优的私有ERM算法？
RQ5在私有ERM中，隐私、效用与计算效率之间的根本权衡是什么？

主要发现

论文在Lipschitz与强凸设置下建立了多余风险的匹配下界，表明所提出的算法在信息论上是最优的。
对于 $(\epsilon,\delta)$- 差分隐私，算法实现了多余风险 $O\left(\frac{L^2 \|\mathcal{C}\|_2^2 \log p}{n\epsilon}\right)$，在某些情况下与非私有Oracle复杂度相匹配。
对于 $(\epsilon,0)$- 差分隐私，算法通过指数采样与高效的对数凹采样，实现了多余风险 $O\left(\frac{L^2 \|\mathcal{C}\|_2^2 \log p}{n\epsilon}\right)$。
论文证明，对非光滑损失函数的简单平滑无法获得最优误差率，从而否定了先前方法在SVM与高维中位数等问题上的适用性。
所提出的算法在多项式时间内运行，并在非光滑、非强凸损失（如合页损失与 $\ell_1$-中位数）下仍能达到最优多余风险。
结果表明，依赖于平滑性假设的先前算法无法直接应用于一般Lipschitz凸ERM以实现最优误差率。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。