QUICK REVIEW

[论文解读] Pure Differentially Private Summation from Anonymous Messages

Badih Ghazi, Noah Golowich|arXiv (Cornell University)|Jan 1, 2020

Privacy-Preserving Technologies in Data参考文献 35被引用 14

一句话总结

该论文首次在洗牌模型中提出了纯差分隐私协议，用于二值和实数求和，且误差恒定。其通过设计一种新型多消息洗牌协议实现：二值求和时每位用户发送 Oϵ(log n) 位，实数求和时发送 Oϵ(log³n) 位，同时证明了紧致的 Ωϵ(√log n) 通信下界，确立了纯差分隐私与近似差分隐私之间的分离，以及洗牌模型与中央模型之间的分离。

ABSTRACT

The shuffled (aka anonymous) model has recently generated significant interest as a candidate distributed privacy framework with trust assumptions better than the central model but with achievable errors smaller than the local model. We study pure differentially private (DP) protocols in the shuffled model for summation, a basic and widely used primitive: - For binary summation where each of n users holds a bit as an input, we give a pure $ε$-DP protocol for estimating the number of ones held by the users up to an error of $O_ε(1)$, and each user sends $O_ε(\log n)$ messages each of 1 bit. This is the first pure protocol in the shuffled model with error $o(\sqrt{n})$ for constant $ε$. Using this protocol, we give a pure $ε$-DP protocol that performs summation of real numbers in $[0, 1]$ up to an error of $O_ε(1)$, and where each user sends $O_ε(\log^3 n)$ messages each of $O(\log\log n)$ bits. - In contrast, we show that for any pure $ε$-DP protocol for binary summation in the shuffled model having absolute error $n^{0.5-Ω(1)}$, the per user communication has to be at least $Ω_ε(\sqrt{\log n})$ bits. This implies the first separation between the (bounded-communication) multi-message shuffled model and the central model, and the first separation between pure and approximate DP protocols in the shuffled model. To prove our lower bound, we consider (a generalization of) the following question: given $γ$ in $(0, 1)$, what is the smallest m for which there are two random variables $X^0, X^1$ supported on $\{0, \dots ,m\}$ such that (i) the total variation distance between $X^0$ and $X^1$ is at least $1-γ$, and (ii) the moment generating functions of $X^0$ and $X^1$ are within a constant factor of each other everywhere? We show that the answer is $m = Θ(\sqrt{\log(1/γ)})$.

研究动机与目标

在洗牌模型中设计首个纯差分隐私协议，用于二值求和，误差为 o(√n)，绝对误差恒定。
将二值求和协议扩展至处理 [0,1] 区间内的实数，实现恒定误差与高效通信。
为洗牌模型中的纯差分隐私建立通信下界，以实现与中央模型及近似差分隐私协议的分离。
解决洗牌模型中纯差分隐私所需最小通信量的根本问题，特别是针对求和任务。
提出一种新的分析框架，用于界定离散分布的总变差距离与矩生成函数，该框架可能具有更广泛的应用。

提出的方法

设计一种多消息洗牌协议，每位用户使用经过仔细调校的噪声，将输入的比特编码为 Oϵ(log n) 个匿名消息，每个消息为单个比特。
利用洗牌器对所有消息进行随机排列，确保分析者无法将消息与用户关联，从而实现纯差分隐私。
应用组合定理，对实数求和协议中多个比特位置的总隐私损失进行有界，通过分配隐私预算以最小化误差。
通过分析两个离散分布的矩生成函数（MGF）比值并关联到总变差距离，证明通信下界。
将分析推广至界定最小的 m，使得 {0,...,m} 上的两个分布的总变差距离 ≥1−γ，且 MGF 比值在常数因子内，证明 m = Θ(√log(1/γ))。
通过独立地对输入的二进制表示中每一位应用二值协议，构建实数求和协议，通过隐私预算分配以最小化误差。

实验结果

研究问题

RQ1在洗牌模型中，纯差分隐私协议能否在次线性通信下实现二值求和的恒定误差？
RQ2在洗牌模型中，纯差分隐私用于二值和实数求和的最优通信复杂度是多少？
RQ3在洗牌模型中，是否存在纯差分隐私与近似差分隐私协议之间的可证明分离？
RQ4能否通过矩生成函数分析，为洗牌模型中的纯差分隐私建立通信下界？
RQ5在纯差分隐私方面，多消息洗牌模型是否在通信效率上严格优于中央模型？

主要发现

该论文提出了一种在洗牌模型中用于二值求和的纯 ϵ-差分隐私协议，绝对误差为 Oϵ(1)，每位用户发送 Oϵ(log n) 条一位消息。
对于 [0,1] 区间内的实数求和，协议实现了 O(√log(1/ϵ)/ϵ³/²) 的期望误差，每位用户发送 Oϵ(log³n) 条 O(log log n) 位的消息。
证明了任何用于二值求和、误差为 n⁰.⁵⁻ᴼ⁽¹⁾ 的纯 ϵ-DP 协议，其每位用户的通信下界为 Ωϵ(√log n) 位，从而与中央模型实现分离。
该下界通过分析最小的 m，使得 {0,...,m} 上的两个分布的总变差距离 ≥1−γ，且 MGF 比值在常数因子内，得出 m = Θ(√log(1/γ))。
该协议首次在洗牌模型中实现了纯差分隐私与近似差分隐私的分离，因为近似差分隐私协议可实现更低通信量下的次常数误差。
实数求和协议通过独立地对每一位二进制位应用二值协议，并分配隐私预算以最小化误差，实现了每位用户 Oϵ(log³n) 的通信复杂度。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。