QUICK REVIEW

[论文解读] Skin Cancer Detection utilizing Deep Learning: Classification of Skin Lesion Images using a Vision Transformer

Carolin Flosdorf, Justin Engelker|arXiv (Cornell University)|Jul 26, 2024

Cutaneous Melanoma Detection and Management被引用 9

一句话总结

本文评估了预训练的 Vision Transformer (ViT) 模型（ViT_L16 和 ViT_L32）在 HAM10000 上对七种皮肤癌的分类，显示 ViTs 在准确性和黑色素瘤召回方面超过传统方法和 CNN 基线。

ABSTRACT

Skin cancer detection still represents a major challenge in healthcare. Common detection methods can be lengthy and require human assistance which falls short in many countries. Previous research demonstrates how convolutional neural networks (CNNs) can help effectively through both automation and an accuracy that is comparable to the human level. However, despite the progress in previous decades, the precision is still limited, leading to substantial misclassifications that have a serious impact on people's health. Hence, we employ a Vision Transformer (ViT) that has been developed in recent years based on the idea of a self-attention mechanism, specifically two configurations of a pre-trained ViT. We generally find superior metrics for classifying skin lesions after comparing them to base models such as decision tree classifier and k-nearest neighbor (KNN) classifier, as well as to CNNs and less complex ViTs. In particular, we attach greater importance to the performance of melanoma, which is the most lethal type of skin cancer. The ViT-L32 model achieves an accuracy of 91.57% and a melanoma recall of 58.54%, while ViT-L16 achieves an accuracy of 92.79% and a melanoma recall of 56.10%. This offers a potential tool for faster and more accurate diagnoses and an overall improvement for the healthcare sector.

研究动机与目标

推动自动化、精准的皮肤癌检测，以应对医生短缺和长时间等待的问题。
评估预训练的 Vision Transformer 模型是否能够在皮肤病变分类中胜过 CNN 和传统分类器。
重点关注黑色素瘤检测性能（召回率），因为其高死亡风险。
利用数据增强来平衡类别并在保留的测试集上进行评估。

提出的方法

使用两种预训练的 ViT 配置（ViT_L16 和 ViT_L32），输入为 224x224，输出为 7 类。
将 ViT 分类头替换为七神经元的 softmax 输出，用于皮肤癌类型。
使用 SGD 优化器和交叉熵损失进行训练，包含早停、最佳权重检查点和学习率调度。
应用数据增强（旋转、平移、亮度、缩放）以解决类别不平衡。
将 ViT 模型与决策树（DTC）、KNN、CNN 基线，以及之前的 ViT/CNN 结果进行对比。
报告测试集上的准确性和黑色素瘤特异召回，以强调致命癌症检测。

实验结果

研究问题

RQ1大型预训练 ViT 模型（ViT_L16、ViT_L32）是否在 HAM10000 皮肤癌数据集上优于传统机器学习模型和基于 CNN 的方法？
RQ2ViT_L16 和 ViT_L32 在七类皮肤癌分类中的准确性和黑色素瘤召回率是多少？
RQ3数据增强在这种不平衡数据集上的影响，以及对模型性能和潜在过拟合的影响？
RQ4在本研究中，ViT 模型在检测黑色素瘤方面是否比其他模型更有效？

主要发现

ViT_L32 达到 91.57% 的准确性，黑色素瘤召回 58.54%。
ViT_L16 达到 92.79% 的准确性，黑色素瘤召回 56.10%。
ViT_L16 和 ViT_L32 均超越 DTC（61.06% 准确率）和 KNN（65.45% 准确率）。
ViT 模型也超越了相关工作中先前的 CNN 结果和较小的 ViT 配置。
附录中的消融研究显示各种设计选择对准确性有影响，最高达到 92.79%。
ViT 的注意力机制有助于比基线模型更出色的病变识别。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。