Skip to main content
QUICK REVIEW

[论文解读] Breaking the Communication-Privacy-Accuracy Trilemma

Weining Chen, Peter Kairouz|arXiv (Cornell University)|Jul 22, 2020
Privacy-Preserving Technologies in Data参考文献 53被引用 32
一句话总结

本文提出新的编码与解码方案,联合优化本地差分隐私与通信约束,在均值估计和频率估计上实现近最优精度,在 ε-LDP 与 b-bit 限制下打破隐私、通信与精度之间的权衡。

ABSTRACT

Two major challenges in distributed learning and estimation are 1) preserving the privacy of the local samples; and 2) communicating them efficiently to a central server, while achieving high accuracy for the end-to-end task. While there has been significant interest in addressing each of these challenges separately in the recent literature, treatments that simultaneously address both challenges are still largely missing. In this paper, we develop novel encoding and decoding mechanisms that simultaneously achieve optimal privacy and communication efficiency in various canonical settings. In particular, we consider the problems of mean estimation and frequency estimation under $\varepsilon$-local differential privacy and $b$-bit communication constraints. For mean estimation, we propose a scheme based on Kashin's representation and random sampling, with order-optimal estimation error under both constraints. For frequency estimation, we present a mechanism that leverages the recursive structure of Walsh-Hadamard matrices and achieves order-optimal estimation error for all privacy levels and communication budgets. As a by-product, we also construct a distribution estimation mechanism that is rate-optimal for all privacy regimes and communication constraints, extending recent work that is limited to $b=1$ and $\varepsilon=O(1)$. Our results demonstrate that intelligent encoding under joint privacy and communication constraints can yield a performance that matches the optimal accuracy achievable under either constraint alone.

研究动机与目标

  • 激励并形式化在分布式学习与估计中的联合隐私-通信-精度权衡。
  • 提供在 ε-LDP 与 b-bit 通信约束下,在经典任务中达到最优或近最优估计误差的方案。
  • 描述何时一个约束占主导,以及如何无成本满足较不严格的约束。
  • 演示可同时用于均值与分布/频率估计的通用方案,考虑共享随机性的问题。

提出的方法

  • 基于 Kashin 表示与随机采样(SQKR)开发一种公币方案用于均值估计,在 ε-LDP 与 b-bit 约束下实现阶最优的 ℓ2 错误。
  • 使用 Kashin 表示对数据进行预处理,将信息均匀分布到系数中,从而实现鲁棒的量化与私有化。
  • 使用 2^k-RR 机制对数据进行量化、子采样与私有化,以发送 k 位报告,然后在服务器端重构无偏估计量。
  • 对统计均值估计,给出一个变体,通过确定性划分坐标来避免共享随机性(在 ε-LDP 与 b-bit 约束下仍达到最优误差)。
  • 引入递归 Hadamard 响应(RHR)方案用于频率估计,利用递归 Hadamard 结构在所有隐私与通信预算下实现阶最优误差。
  • 证明主导约束决定误差,另一约束可以无成本满足,在频率估计设置中产生解码复杂度为 O(n+d log d) 的实际方案。

实验结果

研究问题

  • RQ1在经典任务(均值、频率与分布估计)下,联合 ε-LDP 与 b-bit 通信约束的基本估计误差极限是什么?
  • RQ2利用 Kashin 表示或递归 Hadamard 结构的编码方案,是否能在所有隐私与通信预算下达到阶最优性能?
  • RQ3当另一约束支配误差时,可以在多大程度上无成本满足较不严格的约束?
  • RQ4共享随机性要求如何影响在统计与分布设置中的所提方案的可行性和最优性?

主要发现

  • 对均值估计,r_ME(ℓ2, ε, b) = Θ(d/n · min(ε^2, ε, b)); SQKR 方案在某些情形达到该数量级并在信息论意义上是最优的。
  • Kashin 表示将信息均匀分布到系数上,使私有估计无偏、重构方差低并在联合约束下改善 ℓ2 误差。
  • 对频率估计,r_FE(ℓ2) = Θ(d/(n min{e^ε, (e^ε−1)^2, 2^b, d})) 且 r_FE(ℓ1) = Θ(d/(√(n min{e^ε, (e^ε−1)^2, 2^b, d}))); RHR 在 ε 与 b 的全范围内实现阶最优,解码高效 (O(n + d log d))。
  • The Recursive Hadamard Response (RHR) extends to distribution estimation without shared randomness, achieving order-optimal ℓ1 and ℓ2 errors for all privacy regimes and budgets.
  • 结果表明,基本权衡由更严格的约束支配,较不严格的约束可以无成本满足,这解释了在高隐私情形下为何小预算或单比特预算就足够。
  • The schemes achieve matching information-theoretic lower bounds in the studied settings, validating their optimality.

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。