QUICK REVIEW

[论文解读] Terabyte-scale Deep Multiple Instance Learning for Classification and Localization in Pathology

Gabriele Campanella, Vitor Werneck Krauss Silva|arXiv (Cornell University)|May 17, 2018

AI in cancer detection参考文献 11被引用 41

一句话总结

该论文提出一种 TB 级深度 MIL 框架，用于前列腺针穿刺活检的整张病理切片（WSI）分类，使用切片级标签，在保留集上达到 AUC 0.98。

ABSTRACT

In the field of computational pathology, the use of decision support systems powered by state-of-the-art deep learning solutions has been hampered by the lack of large labeled datasets. Until recently, studies relied on datasets in the order of few hundreds of slides which are not enough to train a model that can work at scale in the clinic. Here, we have gathered a dataset consisting of 12,160 slides, two orders of magnitude larger than previous datasets in pathology and equivalent to 25 times the pixel count of the entire ImageNet dataset. Given the size of our dataset it is possible for us to train a deep learning model under the Multiple Instance Learning (MIL) assumption where only the overall slide diagnosis is necessary for training, avoiding all the expensive pixel-wise annotations that are usually part of supervised learning approaches. We test our framework on a complex task, that of prostate cancer diagnosis on needle biopsies. We performed a thorough evaluation of the performance of our MIL pipeline under several conditions achieving an AUC of 0.98 on a held-out test set of 1,824 slides. These results open the way for training accurate diagnosis prediction models at scale, laying the foundation for decision support system deployment in the clinic.

研究动机与目标

展示 MIL 能够在只有切片级标签的情况下，将尺度扩展到 TB 级别的整张病理图像。
表明大规模 WSI 数据集能够为病理分类提供高性能的深度 MIL 模型。
评估切片策略、放大倍数和类别权重如何影响 MIL 性能。
确定能实现临床相关诊断准确性的模型架构和训练设置。

提出的方法

在多种放大倍数（5x、10x、20x）对 WSIs 进行切分，并丢弃背景切片。
将每张切片视为一个包含切片的袋，并在每个袋内对切片级阳性概率进行排序。
使用每张切片中排名第一的切片进行训练，损失函数为交叉熵。
加权损失以应对类别不平衡（测试权重：0.5、0.7、0.9、0.95、0.99；最终选择 w1=0.9）。
使用 Adam 优化器训练 CNN 分类器（AlexNet、VGG11-BN、ResNet18/34）；使用多种架构进行测试。
在测试时对所有切片进行推理，如任一切片为阳性则判定该切片所属的切片为阳性。

实验结果

研究问题

RQ1基于 MIL 的 TB 级 WSIs 的训练，是否能仅用切片级标签实现较高的诊断准确性？
RQ2数据集规模如何影响 MIL 在整张切片前列腺癌分类中的性能？
RQ3哪些 CNN 架构和放大策略能最大化 MIL 在 WSI 诊断中的性能？
RQ4类别权重和数据增强对 MIL 泛化有何影响？
RQ5多尺度集成方法是否能提高 WSI 分类的 MIL 性能？

主要发现

最佳模型（ResNet34 和 VGG11-BN）在测试集（1,824 张切片）上获得 AUC 约 0.976–0.977。
总体保留测试的 AUC 随着表现最佳的 MIL 方法达到 0.98。
测试集的假阳性率为 3.7%，假阴性率为 9.4%，经错误分析后。
融合不同放大倍数（包括 5x/10x/20x）的结果优于单一放大倍数，提升 ROC 性能。
大规模数据集对于基于 MIL 的 WSI 分类的泛化至关重要。
放大倍数水平影响性能；较低放大倍数导致更高的错误率。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。