QUICK REVIEW

[论文解读] Multiple testing with the structure adaptive Benjamini-Hochberg algorithm

Ang Li, Rina Foygel Barber|arXiv (Cornell University)|Jun 25, 2016

Statistical Methods in Clinical Trials参考文献 18被引用 29

一句话总结

本文提出SABHA，一种结构自适应的Benjamini-Hochberg算法，通过使用数据自适应权重对p值进行重加权，以在已知结构模式（如分组、有序、总变差低）下的多重检验中提升统计功效。该方法将错误发现率（FDR）控制在略高于目标水平，且超额FDR受权重类别的Rademacher复杂度所限制，从而在信号丰富的区域实现更高的发现率，同时不增加第一类错误。

ABSTRACT

In multiple testing problems, where a large number of hypotheses are tested simultaneously, false discovery rate (FDR) control can be achieved with the well-known Benjamini-Hochberg procedure, which adapts to the amount of signal present in the data. Many modifications of this procedure have been proposed to improve power in scenarios where the hypotheses are organized into groups or into a hierarchy, as well as other structured settings. Here we introduce SABHA, the "structure-adaptive Benjamini-Hochberg algorithm", as a generalization of these adaptive testing methods. SABHA incorporates prior information about any pre-determined type of structure in the pattern of locations of the signals and nulls within the list of hypotheses, to reweight the p-values in a data-adaptive way. This raises the power by making more discoveries in regions where signals appear to be more common. Our main theoretical result proves that SABHA controls FDR at a level that is at most slightly higher than the target FDR level, as long as the adaptive weights are constrained sufficiently so as not to overfit too much to the data-interestingly, the excess FDR can be related to the Rademacher complexity or Gaussian width of the class from which we choose our data-adaptive weights. We apply this general framework to various structured settings, including ordered, grouped, and low total variation structures, and get the bounds on FDR for each specific setting. We also examine the empirical performance of SABHA on fMRI activity data and on gene/drug response data, as well as on simulated data.

研究动机与目标

为解决标准多重检验方法将所有假设视为可交换的问题，这些方法忽略了信号与零假设位置中的已知结构模式。
开发一种通用框架，将先验结构知识（如分组、排序、空间聚类）整合到FDR控制程序中，以提高统计功效。
在数据自适应加权下，即使p值存在依赖关系，也能提供有限样本的FDR控制保证，通过Rademacher复杂度等复杂度度量来防止过拟合。
通过真实和模拟数据集，在fMRI和基因表达数据等多种结构化场景中，展示该方法的实证有效性。

提出的方法

SABHA利用基于先验结构假设（如分组、有序、低变差）的数据自适应权重对p值进行重加权，使在信号密度更高的区域具有更高的敏感性。
该方法采用改进的Benjamini-Hochberg程序，通过加权p值调整拒绝阈值，权重的选择反映各区域中预期的信号流行度。
通过使用Rademacher复杂度或高斯宽度等复杂度度量约束权重类，确保FDR控制，防止对噪声的过拟合。
该算法使用插值估计器估计每个结构单元（如组、区间）内的零假设比例，从而为自适应加权方案提供依据。
该程序适用于独立和正相关p值，在PRDS条件下适用，扩展了其实际应用价值。
关键组成部分是数据自适应权重选择机制，通过过拟合风险的理论边界，平衡信号检测功效与FDR控制。

实验结果

研究问题

RQ1我们能否通过整合信号和零假设位置的已知结构模式，提升多重检验程序的功效？
RQ2如何基于结构信息自适应地为p值分配权重，而不损害FDR控制？
RQ3在结构约束下使用数据自适应权重时，FDR控制的理论保证是什么？
RQ4在结构化数据中，SABHA与BH和Storey-BH等现有方法相比，在发现率和FDR控制方面表现如何？
RQ5在真实世界场景中（如fMRI或基因-药物反应研究），SABHA能否在发现功效上带来有意义的提升？

主要发现

在fMRI数据中，SABHA实现了1,234次发现，显著优于BH（931次发现）和Storey-BH（1,217次发现），增益集中于估计具有高信号密度的ROI区域。
SABHA中每个ROI的零假设比例估计值（bq）准确预测了发现率提升的位置，其中bq最低的ROI获得了最大改进。
在基因/药物反应数据中，SABHA在相同FDR目标水平（α = 0.2）下，发现率高于BH和Storey-BH，尤其在信号聚集时表现更优。
理论分析表明，SABHA引入的超额FDR受权重类别的Rademacher复杂度所限制，确保即使使用自适应权重，FDR控制依然成立。
在模拟实验中，SABHA将FDR控制在略高于名义α水平，同时在结构化信号模式下，功效高于标准BH和Storey-BH。
该方法在不同结构（有序、分组、低总变差）下均表现出稳健性，证实了其通用性及对多样化数据模式的适应能力。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。