QUICK REVIEW

[论文解读] Towards A Generalizable Pathology Foundation Model via Unified Knowledge Distillation

Jiabo Ma, Zhengrui Guo|arXiv (Cornell University)|Jul 26, 2024

AI in cancer detection被引用 5

一句话总结

GPFM 是一个可泛化的病理基础模型，通过来自多个专家模型的统一知识蒸馏进行预训练，在39个临床任务中实现综合表现最佳（平均排名 1.36）。

ABSTRACT

Foundation models pretrained on large-scale datasets are revolutionizing the field of computational pathology (CPath). The generalization ability of foundation models is crucial for the success in various downstream clinical tasks. However, current foundation models have only been evaluated on a limited type and number of tasks, leaving their generalization ability and overall performance unclear. To address this gap, we established a most comprehensive benchmark to evaluate the performance of off-the-shelf foundation models across six distinct clinical task types, encompassing a total of 72 specific tasks, including slide-level classification, survival prediction, ROI-tissue classification, ROI retrieval, visual question answering, and report generation. Our findings reveal that existing foundation models excel at certain task types but struggle to effectively handle the full breadth of clinical tasks. To improve the generalization of pathology foundation models, we propose a unified knowledge distillation framework consisting of both expert and self-knowledge distillation, where the former allows the model to learn from the knowledge of multiple expert models, while the latter leverages self-distillation to enable image representation learning via local-global alignment. Based on this framework, we curated a dataset of 96,000 whole slide images (WSIs) and developed a Generalizable Pathology Foundation Model (GPFM). This advanced model was trained on a substantial dataset comprising 190 million images extracted from approximately 72,000 publicly available slides, encompassing 34 major tissue types. Evaluated on the established benchmark, GPFM achieves an impressive average rank of 1.6, with 42 tasks ranked 1st, while the second-best model, UNI, attains an average rank of 3.7, with only 6 tasks ranked 1st.

研究动机与目标

为需要在多样化病理任务上实现泛化的基础模型的必要性提供动力。
创建一个全面的基准，用于评估现成病理基础模型在六类临床类型的39项任务上的表现。
提出一个统一的知识蒸馏框架，结合专家蒸馏与自蒸馏以提升泛化能力。
在大规模、多样化的 WSIs 数据集上预训练一个 Generalizable Pathology Foundation Model (GPFM)，以检验泛化性。

提出的方法

引入一个统一的知识蒸馏框架，结合 Expert Knowledge Distillation 与 Self-Distillation。
使用 Mask Image Modeling (MIM) 与基于 EMA 的参数更新对 GPFM 进行预训练。
汇集来自 34 种组织类型的 86,104 份 WSIs、约 1.9 亿张图像组成的大规模多源数据集。
在覆盖 WSI 分类、生存分析、ROI 组织分类、图像检索、VQA 以及报告生成的综合基准上进行评估。
与现有基础模型（如 UNI、Phikon、CONCH、Ctranspath）进行对比，使用基于排名的统计分析（Wilcoxon 检验、Nemenyi 检验）。

实验结果

研究问题

RQ1一个病理基础模型能否通过统一知识蒸馏在广泛任务上实现泛化？
RQ2GPFM 相较于现有模型在 39 项多样化 CPath 任务上的表现如何？
RQ3专家知识蒸馏对下游任务性能的影响如何？
RQ4从专家模型蒸馏是否提升跨任务的鲁棒性和泛化能力？
RQ5GPFM 在外部验证数据集以及不同任务类别（WSI 分类、生存、ROI 分类、检索、VQA、报告生成）上的表现如何？

主要发现

GPFM 在 39 项任务上达到平均排名 1.36，其中 29 项任务排名第一。
第二名模型（UNI）平均排名 2.96，且有 4 项任务排名第一。
Wilcoxon 检验显示 GPFM 显著优于其他模型（p < 0.001）。
GPFM 在 WSI 分类任务上获得最佳平均 AUC（0.956），并在平衡准确度（0.833）和加权 F1（0.834）方面名列前茅。
在 ROI 分类中，GPFM 获得最佳平均 AUC（0.955），并在多个数据集上领先；外部验证显示 GPFM 在三个数据集上的平均排名为 1.5。
GPFM 在基因突变预测方面表现强劲（如 LUAD-TP53，AUC 0.855；Glioma IDH1，AUC 0.998）。
消融研究表明，移除 Expert Knowledge Distillation 会使 AUC 降低 0.6%、加权 F1 降低 1.8%、平衡准确度下降 1.8%（平均值）。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。