QUICK REVIEW

[论文解读] Longitudinal Risk Prediction in Mammography with Privileged History Distillation

Banafsheh Karimian, Alexis Guichemerre|arXiv (Cornell University)|Mar 16, 2026

AI in cancer detection被引用 0

一句话总结

论文提出 Privileged History Distillation (PHD)，通过训练阶段的 horizon-specific 教师，能够在推理时仅用当前乳腺X线照片实现对多年的风险预测，同时对历史长序列信号进行蒸馏。

ABSTRACT

Breast cancer remains a leading cause of cancer-related mortality worldwide. Longitudinal mammography risk prediction models improve multi-year breast cancer risk prediction based on prior screening exams. However, in real-world clinical practice, longitudinal histories are often incomplete, irregular, or unavailable due to missed screenings, first-time examinations, heterogeneous acquisition schedules, or archival constraints. The absence of prior exams degrades the performance of longitudinal risk models and limits their practical applicability. While substantial longitudinal history is available during training, prior exams are commonly absent at test time. In this paper, we address missing history at inference time and propose a longitudinal risk prediction method that uses mammography history as privileged information during training and distills its prognostic value into a student model that only requires the current exam at inference time. The key idea is a privileged multi-teacher distillation scheme with horizon-specific teachers: each teacher is trained on the full longitudinal history to specialize in one prediction horizon, while the student receives only a reconstructed history derived from the current exam. This allows the student to inherit horizon-dependent longitudinal risk cues without requiring prior screening exams at deployment. Our new Privileged History Distillation (PHD) method is validated on a large longitudinal mammography dataset with multi-year cancer outcomes, CSAW-CC, comparing full-history and no-history baselines to their distilled counterparts. Using time-dependent AUC across horizons, our privileged history distillation method markedly improves the performance of long-horizon prediction over no-history models and is comparable to that of full-history models, while using only the current exam at inference time.

研究动机与目标

在部署时由于错过筛查或首次检查导致纵向历史不可用的空缺问题。
开发一个训练框架，利用完整纵向历史作为特权信息。
生成一个学生模型，仅用当前检查并重构历史来预测多年的风险。
通过多教师蒸馏实现面向 horizon 的长期风险线索的转移。
在 CSAW-CC 数据集上展示在无历史推断下的长期 horizon 性能提升。

提出的方法

使用一个冻结的基于 Mirai 的图像编码器，将每个乳腺X线照片编码为访问嵌入。
通过历史预测模块从当前检查预测缺失的历史嵌入。
将序列（当前 + 重构历史）与纵向编码器和加性风险头聚合，实现多年份风险预测。
在完整历史上训练 horizon-specific 的教师专家，并将其 logits 蒸馏给在重构历史上运行的学生模型。
使用 horizon 逐段的 RCE 损失加上 KL 基的 logits 蒸馏，带有可控权重 λ_l。
使用 Adam 优化并采用余弦学习率调度进行端到端训练。

Figure 1 : Partial AUC at 10% FPR (pAUC@10%) for LoMaR and VMRA at 4- and 5-year horizons as a function of available screening history.

实验结果

研究问题

RQ1是否能够将从完整历史中学习到的纵向风险信号转移到仅在推理时使用当前检查的模型？
RQ2面向 horizon 的教师 stump 在无历史推断下是否提升长期风险预测？
RQ3相较于完整历史和无历史基线，Privileged History Distillation 如何影响1–5年 horizon 下的性能？

主要发现

Model	#H	1y AUC	2y AUC	3y AUC	4y AUC	5y AUC	1y pAUC	2y pAUC	3y pAUC	4y pAUC	5y pAUC
LoMaR	4	0.914 ±0.023	0.865 ±0.020	0.851 ±0.017	0.841 ±0.019	0.851 ±0.016	0.817 ±0.023	0.749 ±0.020	0.738 ±0.018	0.731 ±0.018	0.740 ±0.018
VMRA	4	0.920 ±0.019	0.868 ±0.020	0.851 ±0.017	0.842 ±0.017	0.851 ±0.017	0.822 ±0.020	0.752 ±0.020	0.736 ±0.018	0.728 ±0.019	0.745 ±0.021
Mirai	0	0.924 ±0.020	0.872 ±0.016	0.853 ±0.015	0.837 ±0.014	0.829 ±0.015	0.824 ±0.023	0.753 ±0.019	0.735 ±0.018	0.715 ±0.018	0.711 ±0.021
LoMaR+PHD	0	0.913 ±0.022	0.865 ±0.019	0.852 ±0.016	0.845 ±0.015	0.853 ±0.015	0.810 ±0.031	0.744 ±0.024	0.734 ±0.015	0.735 ±0.020	0.752 ±0.019
VMRA+PHD	0	0.920 ±0.018	0.869 ±0.018	0.852 ±0.016	0.847 ±0.015	0.855 ±0.017	0.818 ±0.020	0.749 ±0.017	0.733 ±0.017	0.734 ±0.017	0.757 ±0.018

基于 PHD 的模型（LoMaR+PHD 与 VMRA+PHD）在历史不可用时缓解性能损失，接近甚至达到完整历史模型的水平。
在各个 horizon 下，蒸馏模型在较长 horizon（4–5 年）和低误警率（low-FPR）区间显示更强的增益。
多教师蒸馏（5 名教师）在 horizon 方面取得最强的收益，尤其是在5年 horizon。
相较于无历史基线，VMRA+PHD 与 LoMaR+PHD 在4–5年预测下获得更高的全曲线 AUC 和 pAUC。
消融实验表明对齐 horizon 的蒸馏至关重要；移除 KD 或使用较少的教师会降低增益。

Figure 2 : Proposed PHD method for longitudinal risk prediction in mammography. Visit embeddings are extracted from each exam (mammogram), and missing historical embeddings are predicted from the current exam. The generated sequence is aggregated by a longitudinal model and passed to an additive haz

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。