QUICK REVIEW

[论文解读] When Relaxations Go Bad: "Differentially-Private" Machine Learning.

Bargav Jayaraman, David Evans|arXiv (Cornell University)|Feb 24, 2019

Privacy-Preserving Technologies in Data被引用 6

一句话总结

本文研究了在差分隐私机器学习中理论隐私保证与实际隐私泄露之间的脱节问题，表明尽管理论边界强大，但常用的大隐私预算（$ε$）和先进机制在现实中仍导致较弱的隐私保护。在逻辑回归和神经网络上的实验揭示了理论上限隐私保证与通过推理攻击测量的实际隐私损失之间存在巨大差距，表明当前方法无法在实用性与有意义的隐私保护之间取得平衡。

ABSTRACT

Differential privacy is a strong notion for privacy that can be used to prove formal guarantees, in terms of a privacy budget, $\epsilon$, about how much information is leaked by a mechanism. However, implementations of privacy-preserving machine learning often select large values of $\epsilon$ in order to get acceptable utility of the model, with little understanding of the impact of such choices on meaningful privacy. Moreover, in scenarios where iterative learning procedures are used, differential privacy variants that offer tighter analyses are used which appear to reduce the needed privacy budget but present poorly understood trade-offs between privacy and utility. In this paper, we quantify the impact of these choices on privacy in experiments with logistic regression and neural network models. Our main finding is that there is a huge gap between the upper bounds on privacy loss that can be guaranteed, even with advanced mechanisms, and the effective privacy loss that can be measured using current inference attacks. Current mechanisms for differentially private machine learning rarely offer acceptable utility-privacy trade-offs with guarantees for complex learning tasks: settings that provide limited accuracy loss provide meaningless privacy guarantees, and settings that provide strong privacy guarantees result in useless models. Code for the experiments can be found here: this https URL

研究动机与目标

调查尽管有理论保证，大隐私预算（$\\epsilon$）在差分隐私机器学习中的实际影响。
评估先进差分隐私机制在复杂模型的迭代学习过程中的有效性。
使用推理攻击量化实际隐私损失，并与理论上限进行对比。
评估逻辑回归和神经网络设置中模型实用性与有意义隐私之间的权衡。
证明当前实现通常无法同时提供可接受的实用性与强大的隐私保护。

提出的方法

作者使用标准的差分隐私优化技术，在逻辑回归和神经网络模型上进行实验。
应用如矩量会计（moments accountant）等先进隐私机制，以收紧迭代训练过程中的隐私预算估计。
使用推理攻击测量实际隐私损失，估算攻击者能从训练数据中恢复的信息量。
将理论隐私边界（$ε$）与不同$ε$值下的实际测量隐私损失进行对比。
系统性地改变$ε$，以评估线性模型和深度学习模型中的实用性-隐私权衡。
实现包含可复现代码，支持复制隐私评估流程。

实验结果

研究问题

RQ1隐私预算$ε$的选择如何影响差分隐私机器学习模型的实际隐私泄露？
RQ2与理论边界相比，先进隐私机制在多大程度上减少了有效隐私泄露？
RQ3推理攻击与理论上限在衡量实际隐私泄露方面有何差异？
RQ4差分隐私逻辑回归和神经网络中的实用性-隐私权衡如何？
RQ5当前的差分隐私训练方法能否同时提供强大的隐私保证和可接受的模型实用性？

主要发现

即使使用先进机制，理论隐私边界与通过推理攻击测量的实际隐私损失之间仍存在显著差距。
确保强理论隐私保证的设置会导致模型实用性不可用，而高实用性设置则提供无意义的隐私保护。
实践中常用的大$ε$值尽管有正式保证，却导致极弱的隐私保护。
如矩量会计等先进机制虽能减少理论隐私预算，但并未转化为实际隐私泄露的显著减少。
推理攻击持续恢复的信息量超过理论$ε$边界所暗示的水平，表明当前实现高估了隐私保护。
本研究证明，当前的差分隐私机器学习方法在复杂学习任务中无法实现实用性与隐私保护的平衡。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。