QUICK REVIEW

[论文解读] Private False Discovery Rate Control

Cynthia Dwork, Weijie Su|arXiv (Cornell University)|Nov 12, 2015

Privacy-Preserving Technologies in Data参考文献 14被引用 21

一句话总结

本文提出了首个用于控制多重假设检验中错误发现率（FDR）的差分隐私算法，通过在隐私保护步骤中适配Benjamini-Hochberg程序实现。该工作提出了在弱假设下对非私有的BHq程序的新证明，以及一种低失真‘一次性’top-$k$原原子，实现了在差分隐私下FDR控制的最小统计功效损失。

ABSTRACT

We provide the first differentially private algorithms for controlling the false discovery rate (FDR) in multiple hypothesis testing, with essentially no loss in power under certain conditions. Our general approach is to adapt a well-known variant of the Benjamini-Hochberg procedure (BHq), making each step differentially private. This destroys the classical proof of FDR control. To prove FDR control of our method, (a) we develop a new proof of the original (non-private) BHq algorithm and its robust variants -- a proof requiring only the assumption that the true null test statistics are independent, allowing for arbitrary correlations between the true nulls and false nulls. This assumption is fairly weak compared to those previously shown in the vast literature on this topic, and explains in part the empirical robustness of BHq. Then (b) we relate the FDR control properties of the differentially private version to the control properties of the non-private version. \end{enumerate} We also present a low-distortion "one-shot" differentially private primitive for "top $k$" problems, e.g., "Which are the $k$ most popular hobbies?" (which we apply to: "Which hypotheses have the $k$ most significant $p$-values?"), and use it to get a faster privacy-preserving instantiation of our general approach at little cost in accuracy. The proof of privacy for the one-shot top~$k$ algorithm introduces a new technique of independent interest.

研究动机与目标

开发首个用于多重假设检验中错误发现率（FDR）控制的差分隐私算法。
在差分隐私下保持FDR控制的统计功效，最小化隐私带来的准确率损失。
在弱假设（仅真零假设检验统计量独立）下，为非私有的Benjamini-Hochberg程序建立新的理论基础。
设计一种低失真、一次性差分隐私top-$k$算法，以高效且准确地选择最显著的$p$-值。
通过耦合论证，将私有化算法的FDR控制特性与非私有版本相关联，确保理论严谨性。

提出的方法

通过将每个步骤改为差分隐私，适配Benjamini-Hochberg的逐步下降程序，用私有统计检验替代经典阈值法。
提出一种新的非私有BHq程序的证明，仅假设真零假设检验统计量独立，允许真与假零假设之间存在任意相关性。
引入一种新颖的‘一次性’差分隐私top-$k$原原子，以$O( frac{ ho}{ ho^2})$的失真选择最显著的$k$个$p$-值，优于迭代剥离方法。
利用Bennett不等式证明一次性top-$k$算法的隐私性，建立在概率向量$c$-接近条件下的对数似然比浓度界。
基于新证明结构，通过耦合论证将私有化版本的FDR控制与非私有版本相关联。
通过满足噪声机制的技术条件，确保$p$-值计算的差分隐私，从而实现FDR控制管道的端到端隐私保障。

实验结果

研究问题

RQ1我们能否设计出在多重假设检验中控制错误发现率（FDR）的差分隐私算法，且统计功效损失最小？
RQ2在非私有BHq程序中，确保FDR控制的最弱假设是什么？这些假设在私有化场景中如何被利用？
RQ3我们能否构建一种差分隐私top-$k$选择原原子，实现对$k$的次线性依赖与低失真，避免迭代剥离？
RQ4如何形式化地关联差分隐私算法与非私有版本的FDR控制特性？
RQ5在FDR控制流程中，$p$-值计算需要满足哪些技术条件，才能确保端到端的差分隐私？

主要发现

本文在仅假设真零假设检验统计量独立的弱条件下，为非私有BHq程序建立了新证明，解释了BHq在实践中表现出的稳健性。
所提出的差分隐私FDR控制方法在相同弱假设下实现了几乎无功效损失的FDR控制，使该方法可在隐私敏感场景中实际部署。
一次性top-$k$原原子在$k$上实现$O( frac{ ho}{ ho^2})$的失真，与现有迭代剥离方法的最佳已知界相当，但效率显著提升且计算成本更低。
一次性top-$k$算法的隐私证明引入了一种基于Bennett不等式的新型浓度技术，应用于对数似然比，该技术在隐私研究中具有独立意义。
通过满足$p$-值计算的技术条件，该方法确保了FDR控制的端到端差分隐私，使私有假设检验流程成为可能。
理论分析证实，私有化算法的FDR控制与非私有版本紧密相关，确保了可靠性与可解释性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。