QUICK REVIEW

[论文解读] Cluster-to-Conquer: A Framework for End-to-End Multi-Instance Learning for Whole Slide Image Classification

Yash Sharma, Aman Shrivastava|arXiv (Cornell University)|Mar 19, 2021

AI in cancer detection参考文献 26被引用 51

一句话总结

C2C 提供一个端到端的 MIL 框架用于 WSI 分类，它对每个切片对补丁进行聚类、从聚类中采样，并使用带 KL 散度正则化的自适应注意力来提升切片级预测。

ABSTRACT

In recent years, the availability of digitized Whole Slide Images (WSIs) has enabled the use of deep learning-based computer vision techniques for automated disease diagnosis. However, WSIs present unique computational and algorithmic challenges. WSIs are gigapixel-sized ($\sim$100K pixels), making them infeasible to be used directly for training deep neural networks. Also, often only slide-level labels are available for training as detailed annotations are tedious and can be time-consuming for experts. Approaches using multiple-instance learning (MIL) frameworks have been shown to overcome these challenges. Current state-of-the-art approaches divide the learning framework into two decoupled parts: a convolutional neural network (CNN) for encoding the patches followed by an independent aggregation approach for slide-level prediction. In this approach, the aggregation step has no bearing on the representations learned by the CNN encoder. We have proposed an end-to-end framework that clusters the patches from a WSI into ${k}$-groups, samples ${k}'$ patches from each group for training, and uses an adaptive attention mechanism for slide level prediction; Cluster-to-Conquer (C2C). We have demonstrated that dividing a WSI into clusters can improve the model training by exposing it to diverse discriminative features extracted from the patches. We regularized the clustering mechanism by introducing a KL-divergence loss between the attention weights of patches in a cluster and the uniform distribution. The framework is optimized end-to-end on slide-level cross-entropy, patch-level cross-entropy, and KL-divergence loss (Implementation: https://github.com/YashSharma/C2C).

研究动机与目标

通过实现端到端学习来解决千兆像素级 WSI 的 MIL 基于 WSI 分类挑战。
利用基于聚类的采样，使模型暴露于多样化且具辨别性的补丁特征。
将补丁编码、基于注意力的聚合和 KL 散度正则化结合起来，以改善补丁与切片表示的联合学习。
在胃肠疾病数据集和乳腺癌数据集上展示出与两阶段 MIL 方法相当或更优的性能。

提出的方法

使用补丁嵌入通过 K-means 将每个 WSI 的补丁聚类为 k 个簇。
从每个簇中抽取 k' 个补丁，形成每个 WSI 的可管理训练子集。
使用 CNN 编码器对补丁进行编码；计算补丁表示 h，并应用两层注意力模块以获得实例权重 a_n。
通过注意力池化将补丁表示聚合为 WSI 表示 z，并预测切片级标签。
端到端训练，损失函数为 L = alpha*L_WSI + beta*L_Patch + gamma*L_KLD，其中 L_KLD 对簇内注意力方差进行正则化。
通过在每个簇内的补丁注意力分布与均匀分布之间应用 KL 散度进行正则化。

实验结果

研究问题

RQ1基于聚类的采样如何影响 MIL 在 WSI 上学习的补丁表示的多样性与质量？
RQ2端到端训练结合基于注意力的聚合是否能在切片级准确率上超过两阶段 MIL 方法？
RQ3KL 散度正则化对注意力分布和模型性能的影响是什么？
RQ4与全监督和两阶段方法相比，C2C 在 GI 活检 WSI 与 CAMELYON16 乳腺癌数据上的表现如何？

主要发现

方法	准确率	精确度	召回率	F1-分数
Campanella-MIL	82.8	94.9	74.5	83.5
Campanella-MIL RNN	74.7	75.4	84.3	79.6
Two-Stage Mean	81.6	87.3	80.3	83.7
C2C (w WSI Loss)	81.6	80.7	90.1	85.2
C2C (w WSI+KLD Loss)	83.9	84.9	86.3	85.4
C2C (w WSI+Patch Loss)	85.1	86.5	88.2	87.4
C2C (w WSI+Patch+KLD Loss)	86.2	85.5	92.2	88.7

C2C 在用于乳糜泻疾病与正常对比分类的胃肠数据集上超越了两阶段 MIL 基线。
结合 WSI 损失、补丁损失和 KL 散度损失可获得更高的 F1 分数以及有竞争力的精度/召回率。
C2C 在 CAMELYON16 上实现了强劲的性能，无切片级监督且采用 ResNet-18 主干。
KL 散度正则化在正实例类别间稳定了注意力（如 MNIST bag 实验所示）。
基于聚类的采样提升了对多样且具辨别力的补丁的暴露，有助于端到端学习。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。