QUICK REVIEW

[论文解读] ROOFS: RObust biOmarker Feature Selection

Bakhmach, Anastasiia, Paul Dufossé|SPIRE - Sciences Po Institutional REpository|Jan 8, 2026

Cancer Immunotherapy and Biomarkers被引用 0

一句话总结

ROOFS 在生物医学数据上对多种特征选择方法进行了基准测试，以通过稳定性、乐观性校正的预测性能和半合成的真阳性/假阳性发现评估来识别鲁棒的生物标志物签名，在NSCLC免疫治疗耐药数据上得到验证。

ABSTRACT

Feature selection (FS) is essential for biomarker discovery and clinical predictive modeling. Over the past decades, methodological literature on FS has become rich and mature, offering a wide spectrum of algorithmic approaches. However, much of this methodological progress has not fully translated into applied biomedical research. Moreover, challenges inherent in biomedical data, such as high-dimensional feature space, low sample size, multicollinearity, and missing values, make FS non-trivial. To help bridge this gap between methodological development and practical application, we propose ROOFS (RObust biOmarker Feature Selection), a Python package available at https://gitlab.inria.fr/compo/roofs, designed to help researchers in the choice of FS method adapted to their problem. ROOFS benchmarks multiple FS methods on the user's data and generates reports summarizing a comprehensive set of evaluation metrics, including downstream predictive performance estimated using optimism correction, stability, robustness of individual features, and true positive and false positive rates assessed on semi-synthetic data with a simulated outcome. We demonstrate the utility of ROOFS on data from the PIONeeR clinical trial, aimed at identifying predictors of resistance to anti-PD-(L)1 immunotherapy in lung cancer. Of the 34 FS methods gathered in ROOFS, we evaluated 23 in combination with 11 classifiers (253 models) and identified a filter based on the union of Benjamini-Hochberg false discovery rate-adjusted p-values from t-test and logistic regression as the optimal approach, outperforming other methods including widely used LASSO. We conclude that comprehensive benchmarking with ROOFS has the potential to improve the reproducibility of FS discoveries and increase the translational value of clinical models.

研究动机与目标

在高维度、多模态生物医学数据、存在缺失值和多重共线性的情形下，推动鲁棒的生物标志物发现特征选择。
提供一个自动化的 Python 包（ROOFS），在数据预处理、特征选择和下游建模中对多种 FS 方法进行基准。
提供全面的评估指标，包括稳定性、带乐观性校正的预测性能，以及来自半合成结果的真阳性/假阳性率。
在肺癌 NSCLC 队列（PIONeeR）上演示 ROOFS 的实用性，以预测对抗 PD-(L)1 疗法的耐药性并指导方法选择。

提出的方法

整合 34 种跨过滤器、嵌入式、包装器和集成的特征选择方法，选取其中 23 种代表性方法进行基准。
对选择后的数据应用下游分类器，使用兼容 scikit-learn 的模型来评估预测性能。
使用自放回抽样（B = 100，在研究中）在相同的预处理和建模流程下估计性能和稳定性。
通过 Harrell、.632 和 .632+ 等方法实现性能估计的乐观性校正，最终报告采用 .632+。
以基于 Nogueira 的频率度量来衡量 FS 的稳定性，并报告各方法的特征级鲁棒性。
使用半合成结果评估真/假发现，以计算 TPR、FPR、FDR 和 FOR。
包含数据预处理步骤（中位数/众数插补和 z-score 标准化），以处理缺失值和特征缩放。

Figure 1: Overview of the roofs pipeline for comprehensive FS benchmarking.

实验结果

研究问题

RQ1哪些跨算法家族的特征选择方法在异质生物医学数据上在稳定性和预测性能之间提供最佳权衡？
RQ2多重共线性和缺失数据如何影响不同 FS 方法所选择的生物标志物签名的鲁棒性？
RQ3ROOFS 的优化和报告是否能引导研究者在临床预测情景中选择最大化真发现同时控制假阳性的 FS 方法？
RQ4评估半合成真实预测变量在评估 FS 方法在真实数据集中的发现能力方面的附加价值是什么？

主要发现

过滤方法通常具有较强的稳定性和具有竞争力的 AUC，在真实数据基准中，p.adjust（来自 t 检验和逻辑回归的 BH 调整 p 值并集）表现最佳。
基于 VIF 的预过滤到 214 个特征在各方法上适度提高了 FS 的稳定性，提升了签名的可靠性。
LASSO 展示出中等稳定性但由于多重共线性在自放回样本中波动较大；许多特征被不一致地选择。
包装器与集成方法通常计算时间更长，未在预测性能上始终优于简单方法如 LASSO 或某些过滤器。
半合成基准显示出取舍：高 TPR 往往伴随更高的 FPR；注重稳定性的方法往往 TPR 较低但 FDR 控制更好；在其设置中，p.adjust 过滤在高 TPR 与可接受 FPR 之间达到平衡。
在 PIONeeR 数据中，基于 p.adjust 的签名结合梯度提升达到最高的乐观性校正 AUC（0.72），且稳定性较强（S = 0.39）。

Figure 2: Instability of LASSO. A: Bootstrap selection frequencies for features selected by LASSO in at least 1 bootstrap sample. B: Variability in selected subset sizes and out-of-bag AUC.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。