Skip to main content
QUICK REVIEW

[論文レビュー] Encode, Shuffle, Analyze Privacy Revisited: Formalizations and Empirical Evaluation

Úlfar Erlingsson, Vitaly Feldman|arXiv (Cornell University)|Jan 10, 2020
Privacy-Preserving Technologies in Data参考文献 45被引用数 36
ひとこと要約

本論文は ESA フレームワークにおける匿名性によるプライバシー強化を形式化し、 removal LDP およびデータ断片化技術を導入し、匿名化されたローカル-DP報告における実用的なプライバシーと有用性のトレードオフを示す実証評価を提供する。

ABSTRACT

Recently, a number of approaches and techniques have been introduced for reporting software statistics with strong privacy guarantees. These range from abstract algorithms to comprehensive systems with varying assumptions and built upon local differential privacy mechanisms and anonymity. Based on the Encode-Shuffle-Analyze (ESA) framework, notable results formally clarified large improvements in privacy guarantees without loss of utility by making reports anonymous. However, these results either comprise of systems with seemingly disparate mechanisms and attack models, or formal statements with little guidance to practitioners. Addressing this, we provide a formal treatment and offer prescriptive guidelines for privacy-preserving reporting with anonymity. We revisit the ESA framework with a simple, abstract model of attackers as well as assumptions covering it and other proposed systems of anonymity. In light of new formal privacy bounds, we examine the limitations of sketch-based encodings and ESA mechanisms such as data-dependent crowds. We also demonstrate how the ESA notion of fragmentation (reporting data aspects in separate, unlinkable messages) improves privacy/utility tradeoffs both in terms of local and central differential-privacy guarantees. Finally, to help practitioners understand the applicability and limitations of privacy-preserving reporting, we report on a large number of empirical experiments. We use real-world datasets with heavy-tailed or near-flat distributions, which pose the greatest difficulty for our techniques; in particular, we focus on data drawn from images that can be easily visualized in a way that highlights reconstruction errors. Showing the promise of the approach, and of independent interest, we also report on experiments using anonymous, privacy-preserving reporting to train high-accuracy deep neural networks on standard tasks---MNIST and CIFAR-10.

研究の動機と目的

  • Clarify when and how anonymity amplifies privacy in local-DP reporting within the ESA framework.
  • Provide practical, prescriptive guidelines for deploying privacy-preserving reporting with anonymity.
  • Identify primitives that optimize privacy-utility tradeoffs in high-dimensional data distributions.
  • Evaluate the impact of fragmentation, one-hot encodings, and sketch-based methods on privacy and utility.
  • Demonstrate applicability to real-world tasks and to training deep learning models under strong central privacy guarantees.

提案手法

  • Revisit the ESA framework and propose a simple abstract attacker model and assumptions for anonymity-based reporting.
  • Define removal-based local DP and contrast it with replacement-based DP to capture practical privacy in distributed monitoring.
  • Introduce report encodings, attribute fragmentation, and report fragmentation as mechanisms to control privacy and utility.
  • Assess sketch-based encodings and data-dependent crowds, highlighting their limitations in certain distributions.
  • Propose fragmentation of data into multiple unlinkable reports to improve privacy-utility tradeoffs.
  • Provide empirical evaluations on real-world, heavy-tailed data distributions and on training neural networks with anonymous LDP reports.

実験結果

リサーチクエスチョン

  • RQ1How can privacy amplification by anonymity be formalized and harnessed in practical statistical reporting?
  • RQ2What simple primitives (removal LDP, one-hot encoding, fragmentation, anonymous shuffling) maximize privacy while preserving utility?
  • RQ3Do sketch-based encodings and data-dependent grouping (e.g., Crowd IDs) improve or degrade privacy/utility in practice?
  • RQ4How can anonymized LDP reports be effectively used to train high-accuracy models on standard tasks?
  • RQ5What attack model and threat assumptions are realistic for practitioners deploying anonymous LDP systems?

主な発見

  • Anonymous shuffling can significantly strengthen central privacy guarantees without sacrificing utility under suitable conditions.
  • A removal-based LDP definition can strengthen local privacy guarantees by a factor of two compared to replacement-based DP.
  • Attribute fragmentation and report fragmentation markedly improve privacy-utility tradeoffs for high-dimensional, sparse data representations.
  • Sketch-based encodings may reduce communication but often add noise that outweighs privacy benefits unless carefully tuned to the data distribution.
  • One-hot encoding with fragmentation provides strong utility, though it may require higher local privacy budgets; sketching requires careful parameter tuning and may be less beneficial.
  • Anonymous LDP reports can be effectively used to train high-accuracy deep neural networks on standard tasks such as MNIST and CIFAR-10 under strong central privacy.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。