QUICK REVIEW

[论文解读] MultiNet with Transformers: A Model for Cancer Diagnosis Using Images

Hosein Barzekar, Yash Patel|arXiv (Cornell University)|Jan 21, 2023

AI in cancer detection被引用 9

一句话总结

本文提出 MultiNet-ViT，将 CNN 主干与 Vision Transformer 结合的变换器增强型多分类乳腺癌组织病理学分类器，在 BreakHis 数据集的八个类别与放大倍数上进行分类。

ABSTRACT

Cancer is a leading cause of death in many countries. An early diagnosis of cancer based on biomedical imaging ensures effective treatment and a better prognosis. However, biomedical imaging presents challenges to both clinical institutions and researchers. Physiological anomalies are often characterized by slight abnormalities in individual cells or tissues, making them difficult to detect visually. Traditionally, anomalies are diagnosed by radiologists and pathologists with extensive training. This procedure, however, demands the participation of professionals and incurs a substantial cost. The cost makes large-scale biological image classification impractical. In this study, we provide unique deep neural network designs for multiclass classification of medical images, in particular cancer images. We incorporated transformers into a multiclass framework to take advantage of data-gathering capability and perform more accurate classifications. We evaluated models on publicly accessible datasets using various measures to ensure the reliability of the models. Extensive assessment metrics suggest this method can be used for a multitude of classification tasks.

研究动机与目标

从病理切片图像中推动及早且准确的癌症诊断。
开发一个利用 CNN 与变换器的混合架构，用于多种疾病类别的分类。
在病理图像的不同放大倍数下提高泛化能力。
在 BreakHis 上将所提出的模型与其他 ViT/CNN 基方法进行对比评估。

提出的方法

在并行的 MultiNet 框架中集成两个迁移学习主干（VGG19 和 ResNet）。
将 CNN 特征与基于 ViT 的头部融合，以捕捉全局与局部图像信息。
引入多尺度分析以处理病理图像的不同放大倍数。
使用交叉熵损失与 Adam 优化器进行训练，学习率为 1e-4。
将 ViT 的 MLP 头与 MultiNet 模型的 MLP 头拼接，以实现八类分类。
在 BreakHis 的 40X、100X、200X、400X 放大倍数的数据上评估模型。

Figure 1 : MultiNet-ViT Architecture: Integrating MultiNet with ViT

实验结果

研究问题

RQ1混合的 CNN-Transformer 架构是否能在多分类乳腺癌病理组织分类上超越仅使用 CNN 或仅使用 Transformer 的模型？
RQ2整合全局（Transformer）与局部（CNN）特征是否能在多放大倍数下提高病理亚型的辨识度？
RQ3MultiNet-ViT 相较其他 ViT/CNN 组合在 BreakHis 分类性能上有何差异？
RQ4在小样本数据集中，迁移学习对使用 Transform er 的医学影像分类是否关键？

主要发现

MultiNet-ViT 在所有模型中表现最佳，平均精确度、召回率和 F1 值为 94%。
ViT 单独在若干类别上达到较高准确率，例如腺病和乳头状癌在某些指标上达到 100%。
将 ViT 与其他主干（ResNet、EfficientNet、DeiT）相结合，在若干类别上提升或等同于基于 ViT 的性能。
基于 DeiT 的合集在 MultiNet、ViT 或 ResNet 配置下经常获得较强的性能，乳头状癌等目标常能被很好识别（如召回率 100%）。
所提出的 transformer 支持的合集在所有 BreakHis 放大倍数下表现出良好的泛化性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。