Skip to main content
QUICK REVIEW

[论文解读] Double Variable Importance Matching to Estimate Distinct Causal Effects on Event Probability and Timing

Yuqi Li, Quinn Lanners|arXiv (Cornell University)|Feb 4, 2026
Advanced Causal Inference Techniques被引用 0
一句话总结

该论文提出一个双重匹配框架,使用混合治愈模型来学习两种独立的距离度量,以估计时间到事件数据中治愈子群的异质性治疗效应以及在条件平均事件时间上的效应;在匹配组内的Kaplan–Meier估计可得到可解释的HTE。

ABSTRACT

In many clinical contexts, estimating effects of treatment in time-to-event data is complicated not only by confounding, censoring, and heterogeneity, but also by the presence of a cured subpopulation in which the event of interest never occurs. In such settings, treatment may have distinct effects on (1) the probability of being cured and (2) the event timing among non-cured individuals. Standard survival analysis and causal inference methods typically do not separate cured from non-cured individuals, obscuring distinct treatment mechanisms on cure probability and event timing. To address these challenges, we propose a matching-based framework that constructs distinct match groups to estimate heterogeneous treatment effects (HTE) on cure probability and event timing, respectively. We use mixture cure models to identify feature importance for both estimands, which in turn informs weighted distance metrics for matching in high-dimensional spaces. Within matched groups, Kaplan-Meier estimators provide estimates of cure probability and expected time to event, from which individual-level treatment effects are derived. We provide theoretical guarantees for estimator consistency and distance metric optimality under an equal-scale constraint. We further decompose estimation error into contributions from censoring, model fitting, and irreducible noise. Simulations and real-world data analyses demonstrate that our approach delivers interpretable and robust HTE estimates in time-to-event settings.

研究动机与目标

  • 在时间到事件分析中,激发将长期治愈概率与短期事件时序区分开的重要性。
  • 引入混合治愈模型以识别治愈与时序的协变量重要性,为匹配设计提供定制化的距离度量。
  • 开发一个双重匹配框架,能够对两个不同的参数估计量提供一致估计。
  • 在等尺度约束下,为距离度量的一致性与最优性提供理论保证。
  • 通过模拟和真实的白血病移植数据集展示性能。

提出的方法

  • 对各治疗组独立拟合混合治愈模型,以获取治愈概率和事件时间分布的协变量系数。
  • 使用绝对值系数构造两种加权距离度量:W_cure = diag(|β1|, |β0|) 和 W_time = diag(|λ1|, |λ0|)。
  • 分别使用各自的距离度量对每个参数估计量进行KNN风格匹配,以形成匹配组。
  • 在匹配的治疗组和对照组内,利用Kaplan–Meier在时间点H的生存概率来估计治愈概率:π(x) = S_M1(H) − S_M0(H)。
  • 在匹配组内通过积分型KM估计来估计条件平均事件时间(CMET):Δ(x) = [∫0^H S_M1(t) dt − H S_M1(H)] / [1 − S_M1(H)] 减去 Z=0 的类似项。
Figure 1: Hypothetical Survival Curves Where Treatment Increases the Cure Probability yet Reduces the Conditional Mean Event Time.
Figure 1: Hypothetical Survival Curves Where Treatment Increases the Cure Probability yet Reduces the Conditional Mean Event Time.

实验结果

研究问题

  • RQ1是否能够在时间窗H内区分并估计治疗对治愈概率与事件时序的异质效应?
  • RQ2在高维设置中,双重、结果导向的匹配方法是否能同时提高两类参数的估计准确性?
  • RQ3在标准因果假设与混合治愈框架下,所提出的估计量是否具有一致性?
  • RQ4使用治愈与时间特异性的距离度量对匹配质量与估计误差相较于标准方法有何影响?
  • RQ5这些方法在仿真与真实临床队列中的表现如何?

主要发现

MethodCure (Setting 1)Time (Setting 1)Cure (Setting 2)Time (Setting 2)Cure (Setting 3)Time (Setting 3)Cure (Setting 4)Time (Setting 4)
Oracle6.6 ± 0.212.3 ± 1.36.3 ± 0.322.0 ± 1.86.7 ± 0.416.0 ± 1.96.8 ± 0.318.2 ± 1.5
Partial Oracle7.7 ± 0.326.8 ± 1.57.6 ± 0.232.6 ± 1.58.0 ± 0.331.3 ± 1.88.3 ± 0.333.4 ± 1.5
MCM KNN7.9 ± 0.326.8 ± 1.07.6 ± 0.233.3 ± 1.58.3 ± 0.331.4 ± 1.38.7 ± 0.338.5 ± 1.3
MCM KNN combined8.0 ± 0.327.6 ± 1.17.8 ± 0.239.4 ± 1.68.7 ± 0.333.4 ± 1.28.9 ± 0.345.9 ± 1.5
Feature Selection KNN8.2 ± 0.328.0 ± 1.18.0 ± 0.243.2 ± 1.89.4 ± 0.538.1 ± 2.59.4 ± 0.441.5 ± 2.6
Euclidean KNN9.9 ± 0.329.8 ± 1.18.3 ± 0.248.7 ± 1.610.5 ± 0.452.1 ± 1.110.8 ± 0.455.6 ± 1.3
Propensity Score KNN17.2 ± 0.237.1 ± 1.39.6 ± 0.491.9 ± 1.219.2 ± 0.270.3 ± 0.919.2 ± 0.288.7 ± 0.9
Prognostic Score KNN13.6 ± 0.538.3 ± 1.88.4 ± 0.492.9 ± 1.614.5 ± 0.560.4 ± 2.115.4 ± 0.468.9 ± 2.4
Cox Model (no match)5.2 ± 0.636.2 ± 2.28.2 ± 0.778.7 ± 3.17.7 ± 0.758.0 ± 1.38.6 ± 0.767.5 ± 1.5
  • 所提出的双重匹配方法在标准假设与等尺度距离约束下能够得到一致的HTE估计。
  • 从治愈与时间分量学习的距离度量提升了匹配质量,相较于欧氏距离或标准倾向/预后评分方法。
  • 在多项仿真实验中,该方法在治愈概率和CMET的平均绝对误差(MAE)方面均优于若干基线,在某些情景接近oracle性能。
  • 在真实数据(ALL, Haplo-SCT 与 MSDT 的对比)中,MCM方法产生了集中的对称HTE分布,组间差异明显,并优于非匹配Cox模型在治愈与时序异质性捕捉方面的表现。
  • 该框架将估计误差分解为删失、模型拟合与不可约噪声,阐明不确定性的来源。
Figure 2: Absolute HTE Estimation Error on Cure Probability.
Figure 2: Absolute HTE Estimation Error on Cure Probability.

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。