QUICK REVIEW

[论文解读] Privacy-Utility Tradeoffs under Constrained Data Release Mechanisms

Ye Wang, Y. Ozan Basciftci|arXiv (Cornell University)|Oct 25, 2017

Privacy-Preserving Technologies in Data参考文献 16被引用 28

一句话总结

本文研究在受限数据访问条件下数据发布机制中的隐私-效用权衡，表明完整数据可用时达到最佳权衡，其次是仅使用有用数据，最后是仅使用敏感数据。本文建立了权衡区域的层级结构，并识别出基于共同信息的条件：在这些条件下，输出扰动可达到与完整数据相同的性能；同时揭示了非对称隐私度量（如最大泄漏）可能违反实现这些结果所必需的新型“关联不等式”。

ABSTRACT

Privacy-preserving data release mechanisms aim to simultaneously minimize information-leakage with respect to sensitive data and distortion with respect to useful data. Dependencies between sensitive and useful data results in a privacy-utility tradeoff that has strong connections to generalized rate-distortion problems. In this work, we study how the optimal privacy-utility tradeoff region is affected by constraints on the data that is directly available as input to the release mechanism. In particular, we consider the availability of only sensitive data, only useful data, and both (full data). We show that a general hierarchy holds: the tradeoff region given only the sensitive data is no larger than the region given only the useful data, which in turn is clearly no larger than the region given both sensitive and useful data. In addition, we determine conditions under which the tradeoff region given only the useful data coincides with that given full data. These are based on the common information between the sensitive and useful data. We establish these results for general families of privacy and utility measures that satisfy certain natural properties required of any reasonable measure of privacy or utility. We also uncover a new, subtler aspect of the data processing inequality for general non-symmetric privacy measures and discuss its operational relevance and implications. Finally, we derive exact closed-analytic-form expressions for the privacy-utility tradeoffs for symmetrically dependent sensitive and useful data under mutual information and Hamming distortion as the respective privacy and utility measures.

研究动机与目标

分析在数据发布机制中，当对敏感数据或有用数据的访问受限时，隐私-效用权衡区域如何受到影响。
建立三种数据访问场景之间的基本层级关系：完整数据、仅使用有用数据（输出扰动）、仅使用敏感数据（推理）。
识别输出扰动机制与完整数据机制具有相同权衡区域的条件，基于敏感数据与有用数据之间的共同信息。
研究新型“关联不等式”对非对称隐私度量（如最大泄漏和差分隐私）的操作影响。
在对称依赖数据下，推导出基于互信息和汉明失真下的隐私-效用权衡的精确闭式表达式。

提出的方法

将隐私-效用框架推广至允许任意数据观测约束，建模仅敏感数据、仅有用数据或两者均可用的输入场景。
引入一般隐私度量 $ J(X;Z) $ 和效用度量 $ D(P_{Y,Z}) $，仅需满足任何合理度量的自然公理性质。
利用信息论不等式建立权衡区域的层级结构，证明完整数据机制在隐私-效用方面优于输出扰动和推理机制。
识别出非对称隐私度量的新型“关联不等式” $ J(X;Z) \leq J(Y;Z) $，其与标准后处理不等式不同。
在对称对分布 $ (X,Y) \sim SP(m,p) $ 下，推导出隐私-效用权衡的精确闭式解，使用互信息和汉明失真。
分析三种机制：完整数据、输出扰动（Z 仅依赖于 Y）、推理（Z 仅依赖于 X），比较其权衡区域。

实验结果

研究问题

RQ1当发布机制仅能访问敏感数据、仅能访问有用数据或两者均可访问时，隐私-效用权衡区域如何变化？
RQ2在何种条件下，输出扰动机制能达到与完整数据机制相同的隐私-效用权衡？
RQ3敏感数据与有用数据之间的共同信息在决定权衡区域等价性方面起什么作用？
RQ4非对称隐私度量（如最大泄漏和差分隐私）如何表现于新识别出的“关联不等式”之下？
RQ5在对称依赖数据下，基于互信息和汉明失真，隐私-效用权衡的精确闭式表达式是什么？

主要发现

完整数据机制的隐私-效用权衡区域严格大于推理机制，而输出扰动机制的区域介于两者之间。
当且仅当 X 与 Y 的共同信息等于其互信息时，输出扰动机制能达到与完整数据机制相同的权衡区域。
对于对称依赖数据 $ (X,Y) \sim SP(m,p) $，最优输出扰动机制添加的噪声分布为 $ P_N(n) = 1-t $（当 $ n=0 $ 时），其余情况为 $ t/(m-1) $，其中 $ t = \min(\delta, 1 - 1/m) $。
仅当 $ p \notin (\delta, (m-1)(1 - \delta)) $ 时，推理机制才能实现有限的隐私-效用权衡；否则，权衡为无穷大，表明在失真预算下无可行解。
无穷阶 Sibson 互信息与信息隐私度量同时满足后处理与关联不等式，但最大泄漏和差分隐私可能违反关联不等式。
本文给出了在互信息与汉明失真下最优隐私-效用权衡的精确闭式表达式：当 $ \delta < 1 - 1/m $ 时，$ \pi_{\text{OP}}(\delta) = r_m\left(p + \delta\left(1 - \frac{pm}{m-1}\right)\right) $，否则为零。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。