QUICK REVIEW

[论文解读] Accuracy First: Selecting a Differential Privacy Level for Accuracy-Constrained ERM

Katrina Ligett, Seth Neel|arXiv (Cornell University)|May 30, 2017

Privacy-Preserving Technologies in Data被引用 32

一句话总结

本文提出了一种噪声减少框架，使数据分析师能够在经验风险最小化（ERM）中选择最私密的差分隐私级别（ε），该级别仍满足固定的精度要求。通过使用相关噪声和自适应AboveThreshold机制，该方法最小化了隐私开销，并与理论边界或基线搜索方法相比，实现了显著更强的隐私保护（例如，e^ε ≈ 10 对比 495）。

ABSTRACT

Traditional approaches to differential privacy assume a fixed privacy requirement $ε$ for a computation, and attempt to maximize the accuracy of the computation subject to the privacy constraint. As differential privacy is increasingly deployed in practical settings, it may often be that there is instead a fixed accuracy requirement for a given computation and the data analyst would like to maximize the privacy of the computation subject to the accuracy constraint. This raises the question of how to find and run a maximally private empirical risk minimizer subject to a given accuracy requirement. We propose a general "noise reduction" framework that can apply to a variety of private empirical risk minimization (ERM) algorithms, using them to "search" the space of privacy levels to find the empirically strongest one that meets the accuracy constraint, incurring only logarithmic overhead in the number of privacy levels searched. The privacy analysis of our algorithm leads naturally to a version of differential privacy where the privacy parameters are dependent on the data, which we term ex-post privacy, and which is related to the recently introduced notion of privacy odometers. We also give an ex-post privacy analysis of the classical AboveThreshold privacy tool, modifying it to allow for queries chosen depending on the database. Finally, we apply our approach to two common objectives, regularized linear and logistic regression, and empirically compare our noise reduction methods to (i) inverting the theoretical utility guarantees of standard private ERM algorithms and (ii) a stronger, empirical baseline based on binary search.

研究动机与目标

解决理论差分隐私方法（固定ε并最大化精度）与实际场景（固定精度并需最大化隐私）之间的差距。
设计一种方法，通过经验手段找到与期望精度目标一致的最小ε，同时保持严格的隐私保证。
最小化搜索过程本身的隐私成本，该成本在自适应数据分析中通常较高。
引入并分析一种新隐私概念——事后隐私（ex-post privacy），该概念考虑了数据相关的隐私参数。
实证证明，所提方法在隐私-精度权衡方面显著优于理论效用边界或标准基线搜索方法。

提出的方法

该方法使用一种噪声减少技术，通过从高度私密的初始估计中减去噪声，逐步生成隐私性更低的假设，利用相关噪声避免额外的隐私成本。
它应用了一种交互式AboveThreshold算法，按顺序测试假设，私密地识别出第一个满足精度阈值的假设。
通过一种修改后的分析方法，考虑了数据相关的查询，将搜索的隐私成本对数地限制在查询次数内。
该框架通过标准的私有ERM算法（如协方差扰动和输出扰动）实现，适用于岭回归和逻辑回归。
该方法引入了事后隐私，其中隐私参数为数据相关，且提供了对该概念的正式分析。
该算法输出第一个满足精度约束的假设，其隐私损失等于最终假设的隐私损失加上AboveThreshold机制的成本。

实验结果

研究问题

RQ1数据分析师如何找到在ERM中仍满足固定精度要求的最私密差分隐私参数ε？
RQ2对ε的自适应搜索的隐私成本是多少，如何将其最小化？
RQ3能否设计一种噪声减少框架，实现在最小额外隐私开销下进行私密假设生成？
RQ4在隐私-精度权衡方面，该方法与理论效用边界和经验搜索基线相比表现如何？
RQ5从数据相关的隐私参数选择中自然引出的新隐私概念——事后隐私，其本质是什么，如何对其进行正式分析？

主要发现

所提噪声减少方法实现的隐私保护显著优于理论效用边界，在α=0.05的岭回归中，将隐私风险因子e^ε从约495降低至10.0。
在相同精度目标（α=0.075）下，该方法将e^ε降低至4.65，而DoublingMethod基线方法为56.6。
假设测试阶段（通过InteractiveAboveThreshold）的隐私成本高于预期，主要原因是假设范数的敏感度边界过于保守。
经验上，所测试假设的实际范数远低于理论上的上界，表明更紧致的敏感度估计可进一步提升隐私保证。
该方法的隐私损失几乎等同于最终假设的隐私损失，仅因搜索过程带来对数级的额外开销。
事后隐私分析确认，即使ε是基于数据自适应选择的，该方法仍能保持严格的隐私保证。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。