[论文解读] Make up your mind: the price of online queries in differential privacy
本文证明了差分隐私的三种模型——离线、在线和自适应——在能力上本质上是不同的。它证明了在离线模型中可以比在线模型更准确地回答指数级更多的统计查询,而在在线模型中可以比自适应模型更准确地回答更多搜索查询,从而挑战了这些模型在实践中等价的假设。
We consider the problem of answering queries about a sensitive dataset subject to differential privacy. The queries may be chosen adversarially from a larger set Q of allowable queries in one of three ways, which we list in order from easiest to hardest to answer:• Offline: The queries are chosen all at once and the differentially private mechanism answers the queries in a single batch.• Online: The queries are chosen all at once, but the mechanism only receives the queries in a streaming fashion and must answer each query before seeing the next query.• Adaptive: The queries are chosen one at a time and the mechanism must answer each query before the next query is chosen. In particular, each query may depend on the answers given to previous queries.Many differentially private mechanisms are just as efficient in the adaptive model as they are in the offline model. Meanwhile, most lower bounds for differential privacy hold in the offline setting. This suggests that the three models may be equivalent.We prove that these models are all, in fact, distinct. Specifically, we show that there is a family of statistical queries such that exponentially more queries from this family can be answered in the offline model than in the online model. We also exhibit a family of search queries such that exponentially more queries from this family can be answered in the online model than in the adaptive model. We also investigate whether such separations might hold for simple queries like threshold queries over the real line.
研究动机与目标
- 调查差分隐私的三种模型——离线、在线和自适应——在能够准确回答的查询数量上是否等价。
- 挑战一种常见假设,即在线模型中高效的机制在自适应设置中依然高效。
- 通过构建查询族,其中可准确回答的查询数量存在指数级差异,建立这些模型之间的正式分离。
- 探讨此类分离是否适用于简单的查询类型,例如实数轴上的阈值查询。
提出的方法
- 构造一类统计查询,使得离线模型能够比在线模型更准确地回答指数级更多的查询。
- 设计一类搜索查询,使得在线模型能够比自适应模型更显著地回答更多查询。
- 使用信息论论证,建立在每种模型下可回答查询数量的下限。
- 分析自适应模型中查询的依赖结构,其中每个查询可能依赖于先前的答案,以说明其难度增加。
- 使用查询复杂度作为度量标准,比较不同差分私有机制在三种模型中的性能。
- 证明查询选择的结构(批量 vs. 流式 vs. 自适应)从根本上影响了隐私-效用权衡。
实验结果
研究问题
- RQ1差分隐私的离线、在线和自适应模型在能够准确回答的查询数量上是否等价?
- RQ2离线模型与在线模型之间,准确回答的查询数量是否存在指数级差距?
- RQ3对于某些查询族,在线模型与自适应模型之间是否存在指数级分离?
- RQ4此类分离是否在简单的查询类型(如实数轴上的阈值查询)中依然存在?
主要发现
- 存在一类统计查询,使得离线模型能够比在线模型更准确地回答指数级更多的查询。
- 存在一类搜索查询,使得在线模型能够比自适应模型更准确地回答指数级更多的查询。
- 在差分隐私下,三种模型——离线、在线和自适应——在查询回答能力上本质上是不同的。
- 结果表明,自适应模型在某些查询族中严格比在线模型更受限制,而在线模型也严格比离线模型更受限制,就效用而言。
- 这些分离不仅限于复杂查询族;本文研究了简单查询(如阈值查询)中是否存在类似差距,尽管在提供的摘要中未给出此类查询的显式定量结果。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。