[论文解读] Encode, Shuffle, Analyze Privacy Revisited: Formalizations and Empirical Evaluation
本文在 ESA 框架中形式化了基于匿名性的隐私放大,提出移除 LDP 与数据片段化(fragmentation)等技术,并提供实证评估,显示匿名、本地差分隐私报PR告的实际隐私-实用性权衡。
Recently, a number of approaches and techniques have been introduced for reporting software statistics with strong privacy guarantees. These range from abstract algorithms to comprehensive systems with varying assumptions and built upon local differential privacy mechanisms and anonymity. Based on the Encode-Shuffle-Analyze (ESA) framework, notable results formally clarified large improvements in privacy guarantees without loss of utility by making reports anonymous. However, these results either comprise of systems with seemingly disparate mechanisms and attack models, or formal statements with little guidance to practitioners. Addressing this, we provide a formal treatment and offer prescriptive guidelines for privacy-preserving reporting with anonymity. We revisit the ESA framework with a simple, abstract model of attackers as well as assumptions covering it and other proposed systems of anonymity. In light of new formal privacy bounds, we examine the limitations of sketch-based encodings and ESA mechanisms such as data-dependent crowds. We also demonstrate how the ESA notion of fragmentation (reporting data aspects in separate, unlinkable messages) improves privacy/utility tradeoffs both in terms of local and central differential-privacy guarantees. Finally, to help practitioners understand the applicability and limitations of privacy-preserving reporting, we report on a large number of empirical experiments. We use real-world datasets with heavy-tailed or near-flat distributions, which pose the greatest difficulty for our techniques; in particular, we focus on data drawn from images that can be easily visualized in a way that highlights reconstruction errors. Showing the promise of the approach, and of independent interest, we also report on experiments using anonymous, privacy-preserving reporting to train high-accuracy deep neural networks on standard tasks---MNIST and CIFAR-10.
研究动机与目标
- 在 ESA 框架内澄清何时以及如何通过匿名性放大本地差分隐私报告的隐私。
- 提供用于在匿名情况下部署隐私保护报告的实用、可操作性指导。
- 识别在高维数据分布中优化隐私-实用性权衡的原语。
- 评估碎片化、独热编码和基于草图的方法对隐私与实用性的影响。
- 展示在实际任务中及在强中央隐私保障下训练深度学习模型的适用性。
提出的方法
- 重新审视 ESA 框架,并为基于匿名性的报告提出一个简单的抽象攻击者模型及假设。
- 定义基于移除的本地 DP,并将其与基于替换的 DP 进行对比,以捕捉分布式监测中的实际隐私。
- 引入报告编码、属性碎片化以及报告碎片化作为控制隐私和实用性的机制。
- 评估基于草图的编码与数据相关的群体(Crowd IDs),并强调它们在某些分布中的局限性。
- 提出将数据分成多个不可连接的报告进行碎片化,以改善隐私-实用性权衡。
- 在现实世界的重尾数据分布以及在使用匿名 LDP 报告训练神经网络方面提供实证评估。
实验结果
研究问题
- RQ1如何在实际统计报告中形式化并利用基于匿名性的隐私放大?
- RQ2哪些简单原语(移除 LDP、一热编码、碎片化、匿名洗牌)在最大化隐私的同时保持实用性?
- RQ3基于草图的编码和数据相关分组(例如 Crowd IDs)在实际中是提升还是降低隐私/实用性?
- RQ4如何有效地使用匿名化的 LDP 报告在标准任务上训练出高精度模型?
- RQ5对于部署匿名 LDP 系统的从业者来说,哪些攻击模型和威胁假设是现实可行的?
主要发现
- 在合适条件下,匿名洗牌可以显著加强中心隐私保障,而不牺牲实用性。
- 基于移除的 LDP 定义在本地隐私保障方面相对于基于替换的 DP 提升了约两倍的强度。
- 属性碎片化和报告碎片化显著改善了高维稀疏数据表示的隐私-实用性权衡。
- 基于草图的编码可能减少通信,但通常增加的噪声会超过隐私收益,除非对数据分布进行精心调校。
- 独热编码结合碎片化提供较强的实用性,尽管可能需要更高的本地隐私预算;草图编码需要仔细的参数调优,可能收益较小。
- 匿名 LDP 报告在强中心隐私保护下可有效用于在标准任务如 MNIST 和 CIFAR-10 上训练高精度深度神经网络。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。