QUICK REVIEW

[论文解读] Towards "simultaneous selective inference": post-hoc bounds on the false discovery proportion

Eugene Katsevich, Aaditya Ramdas|arXiv (Cornell University)|Mar 19, 2018

Machine Learning and Algorithms被引用 5

一句话总结

本文提出了针对现有 FDR 方法整个拒绝路径的、统一且依赖于数据的错误发现比例（FDP）的界限，使得在事后可信赖地选择拒绝集成为可能，从而实现高置信度的 FDP 控制。研究证明，对于多种 FDR 方法——包括 Benjamini-Hochberg 方法——在高概率下，FDP 被统一地限制在目标 FDR 水平 $q$ 的小常数倍（接近 2）以内，从而实现了同时的选择性推断。

ABSTRACT

The false discovery rate (FDR) has become a popular Type-I error criterion for multiple testing, but it is not without its flaws. Indeed, (a) controlling the mean of the false discovery proportion (FDP) does not preclude large FDP variability, and (b) committing to an error level $q$ before observing the data limits its use in exploratory data analysis. We take a step towards addressing both of the above drawbacks by proving uniform FDP bounds for a variety of existing FDR procedures. In particular, many such procedures proceed by examining a $ extit{path}$ of potential rejection sets $\varnothing = \mathcal R_0 \subseteq \mathcal R_1 \subseteq \cdots \subseteq \mathcal R_n \subseteq [n]$, assigning an estimate $\widehat{ ext{FDP}}(\mathcal R_k)$ to each one, and choosing the final rejection set $\mathcal R_{k^*}$ via $k^* = \max\{k: \widehat{ ext{FDP}}(\mathcal R_k) \leq q\}$. We prove that for a wide variety of such procedures (including Benjamini-Hochberg), under independent p-values, $\widehat{ ext{FDP}}$ bounds the FDP to within a small explicit constant factor $c_{ ext{alg}}(\alpha)$, uniformly across the entire path, with probability $1-\alpha$. This constant is close to 2 for several procedures at the 95% confidence level. These bounds imply that existing FDR procedures also have FDP bounded with high probability by a small constant multiple of the target FDR level $q$. Our bounds also open up a middle ground between fully simultaneous inference and fully selective inference. They allow the scientist to $ extit{spot}$ one or more suitable rejection sets (Select Post-hoc On the algorithm's Trajectory) by picking data-dependent sizes or error-levels, after examining the entire path of $\widehat{ ext{FDP}}(\mathcal R_k)$ and the uniform upper band on FDP.

研究动机与目标

解决 FDR 控制的局限性，即仅控制错误发现比例的均值，而未对其变异性施加约束。
克服在数据检查前必须预先指定误差水平 $q$ 的僵化性，从而阻碍探索性数据分析。
为拒绝集路径中所有拒绝集提供统一的、高概率的 FDP 界限，确保在观察完整条路径后可基于数据驱动方式选择拒绝集。
在完全同时推断与完全选择性推断之间提供一个折中方案，允许事后选择具有保证 FDP 控制的拒绝集。

提出的方法

将 FDR 方法形式化为嵌套拒绝集路径 $\mathcal{R}_0 \subseteq \mathcal{R}_1 \subseteq \cdots \subseteq \mathcal{R}_n$，其中每个集合对应一个 p 值阈值。
使用现有的 FDR 估计技术，为每个拒绝集 $\mathcal{R}_k$ 分配一个估计的 FDP $\widehat{\text{FDP}}(\mathcal{R}_k)$。
证明在独立 p 值条件下，真实 FDP 在所有 $k$ 上均以 $c_{\text{alg}}(\alpha) \cdot \widehat{\text{FDP}}(\mathcal{R}_k)$ 为上界，且该结论以概率 $1 - \alpha$ 成立，其中常数 $c_{\text{alg}}(\alpha)$ 较小。
建立 $c_{\text{alg}}(\alpha)$ 在 95% 置信水平下接近 2，适用于多种标准 FDR 方法（如 Benjamini-Hochberg）。
利用这些统一界限，使研究人员能够基于观察到的 $\widehat{\text{FDP}}(\mathcal{R}_k)$ 路径事后选择拒绝集，而不会使 FDP 超过 $q$ 的受控倍数。
实现一种新的推断形式——“同时选择性推断”——其中最终的拒绝集可从路径中自适应地选择，且具有高概率的 FDP 控制。

实验结果

研究问题

RQ1我们能否为 FDR 方法路径中所有拒绝集提供统一的、高概率的错误发现比例（FDP）界限？
RQ2在使用如 Benjamini-Hochberg 等标准 FDR 方法时，FDP 的变异性在多大程度上仍能被控制？
RQ3我们能否在不预先指定误差水平 $q$ 的前提下，实现基于数据的、事后选择拒绝集，同时保持 FDP 控制？
RQ4使得真实 FDP 以概率 $1 - \alpha$ 被 $c_{\text{alg}}(\alpha) \cdot \widehat{\text{FDP}}(\mathcal{R}_k)$ 限制的最紧可能常数因子 $c_{\text{alg}}(\alpha)$ 是多少？
RQ5我们如何利用基于 FDR 路径的方法弥合完全同时推断与完全选择性推断之间的差距？

主要发现

对于一大类 FDR 方法（包括 Benjamini-Hochberg），真实 FDP 在所有 $k$ 上以概率 $1 - \alpha$ 被统一地限制在 $c_{\text{alg}}(\alpha) \cdot \widehat{\text{FDP}}(\mathcal{R}_k)$ 以内。
在 95% 置信水平下，常数 $c_{\text{alg}}(\alpha)$ 接近 2，意味着 FDP 以高概率被限制在估计 FDP 的两倍以内。
这些界限在整个拒绝集路径上统一成立，使得在路径的任意点均可实现高置信度推断。
该方法允许基于数据依赖性标准（如大小或误差水平）事后选择拒绝集，且 FDP 控制由统一界限保证。
该方法实现了一种新的推断范式——“同时选择性推断”——研究人员可在观察完整路径后，对多个候选拒绝集进行抽查。
结果表明，现有 FDR 方法不仅控制了 FDP 的均值，也以高概率控制了其变异性，从而解决了标准 FDR 的一个关键局限性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。