QUICK REVIEW

[论文解读] Effective Ways to Build and Evaluate Individual Survival Distributions

Humza Haider, Bret Hoehn|arXiv (Cornell University)|Nov 28, 2018

Liver Disease Diagnosis and Treatment参考文献 41被引用 25

一句话总结

本文提出了个体生存分布（ISD）模型，可在所有时间点提供患者特异的生存概率，克服了传统方法（如风险评分、单时间点概率模型和群体水平的Kaplan-Meier曲线）的局限性。本文引入了D-Calibration，并使用多种指标评估ISD模型，发现多任务逻辑回归（MTLR）在多种生存数据集上，于校准性、Brier评分和 concordance 方面始终优于其他模型。

ABSTRACT

An accurate model of a patient's individual survival distribution can help determine the appropriate treatment for terminal patients. Unfortunately, risk scores (e.g., from Cox Proportional Hazard models) do not provide survival probabilities, single-time probability models (e.g., the Gail model, predicting 5 year probability) only provide for a single time point, and standard Kaplan-Meier survival curves provide only population averages for a large class of patients meaning they are not specific to individual patients. This motivates an alternative class of tools that can learn a model which provides an individual survival distribution which gives survival probabilities across all times - such as extensions to the Cox model, Accelerated Failure Time, an extension to Random Survival Forests, and Multi-Task Logistic Regression. This paper first motivates such "individual survival distribution" (ISD) models, and explains how they differ from standard models. It then discusses ways to evaluate such models - namely Concordance, 1-Calibration, Brier score, and various versions of L1-loss - and then motivates and defines a novel approach "D-Calibration", which determines whether a model's probability estimates are meaningful. We also discuss how these measures differ, and use them to evaluate several ISD prediction tools, over a range of survival datasets.

研究动机与目标

解决临床决策中缺乏对所有时间点的准确、个体化生存概率估计的问题。
开发并评估评估指标（尤其是D-Calibration），以评估生存概率估计的合理性。
使用多种评估标准，比较ISD模型（如Cox-KP、AFT、RSF-KM、MTLR）在多样化生存数据集上的性能。
证明ISD模型相较于单时间点或风险评分模型，能提供更具临床相关性和一致性的预测结果。
倡导在临床和研究环境中采用ISD模型，特别是MTLR，以提升预后预测的准确性。

提出的方法

提出个体生存分布（ISD）模型，为每位患者x在所有未来时间t ≥ 0处估计S(t|x)。
引入D-Calibration，一种新颖的指标，用于评估预测生存概率是否在时间上与实际结果一致。
采用标准评估指标：区分度（Concordance）、1-校准性（1-Calibration）、Brier评分和L1-loss用于模型评估。
在多个真实世界生存数据集上应用并比较五种ISD模型：Cox-KP、Cox-EN-KP、AFT、RSF-KM和MTLR。
使用时间依赖性校准和综合Brier评分，评估随时间推移的概率准确性。
采用一种将生存预测视为带右删失结果的回归任务的框架，利用患者特异性协变量。

实验结果

研究问题

RQ1如何有效构建个体生存分布，以针对每位患者提供准确、时间特定的生存概率？
RQ2评估ISD模型的可靠性和校准性时，最合适的评估指标是什么？
RQ3ISD模型与传统风险评分和单时间点概率模型相比，在预测准确性和临床实用性方面表现如何？
RQ4所提出的D-Calibration指标是否能有效识别出具有有意义概率估计的模型？
RQ5在包括校准性、区分度和Brier评分在内的多种评估标准下，哪种ISD模型表现最佳？

主要发现

MTLR在多种生存数据集上，于L1-loss、综合Brier评分和Concordance方面始终优于其他ISD模型。
MTLR在校准性指标上也达到或超过所有其他模型，表明其预测概率与实际生存结果最为一致。
使用单时间点概率模型（如5年生存率）可能导致临床决策不一致，因为患者之间的排名可能在不同时间点发生反转。
D-Calibration能有效识别出在时间上具有意义且校准良好的预测生存概率的模型。
ISD模型相较于风险评分或单时间点模型，能提供更具临床相关性的信息，因为其支持任意时间点的决策，并可实现个体化生存曲线的可视化。
本研究证明，ISD模型（特别是MTLR）在生成可靠、患者特异的生存预测方面表现更优，可有效支持临床决策。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。