QUICK REVIEW

[論文レビュー] Skin Cancer Detection utilizing Deep Learning: Classification of Skin Lesion Images using a Vision Transformer

Carolin Flosdorf, Justin Engelker|arXiv (Cornell University)|Jul 26, 2024

Cutaneous Melanoma Detection and Management被引用数 9

ひとこと要約

論文は、 HAM10000 に対する七クラスの皮膚がん分類で事前学習済みの Vision Transformer (ViT) モデル（ViT_L16 および ViT_L32）を評価し、ViT が精度と悪性黒色腫の再現率で従来法と CNN のベースラインを上回ることを示しています。

ABSTRACT

Skin cancer detection still represents a major challenge in healthcare. Common detection methods can be lengthy and require human assistance which falls short in many countries. Previous research demonstrates how convolutional neural networks (CNNs) can help effectively through both automation and an accuracy that is comparable to the human level. However, despite the progress in previous decades, the precision is still limited, leading to substantial misclassifications that have a serious impact on people's health. Hence, we employ a Vision Transformer (ViT) that has been developed in recent years based on the idea of a self-attention mechanism, specifically two configurations of a pre-trained ViT. We generally find superior metrics for classifying skin lesions after comparing them to base models such as decision tree classifier and k-nearest neighbor (KNN) classifier, as well as to CNNs and less complex ViTs. In particular, we attach greater importance to the performance of melanoma, which is the most lethal type of skin cancer. The ViT-L32 model achieves an accuracy of 91.57% and a melanoma recall of 58.54%, while ViT-L16 achieves an accuracy of 92.79% and a melanoma recall of 56.10%. This offers a potential tool for faster and more accurate diagnoses and an overall improvement for the healthcare sector.

研究の動機と目的

自動化された正確な皮膚がん検出を促進し、臨床医不足と長い待機時間に対処する。
事前学習済みの Vision Transformer モデルが皮膚病変分類で CNN や伝統的な分類器を上回るかどうかを評価する。
死亡リスクが高い悪性黒色腫の検出性能（リコール）に焦点を当てる。
クラスの不均衡を是正するデータ拡張を利用し、保持されたテストセットで評価する。

提案手法

224x224 入力と 7 クラス出力を持つ2つの事前学習済み ViT 構成（ViT_L16 および ViT_L32）を使用する。
皮膚がんタイプのための 7 ニューロン・ソフトマックス出力で ViT の分類ヘッドを置換する。
SGD 最適化手法とクロスエントロピー損失で訓練し、早期停止、最良重量のチェックポイント、学習率スケジューリングを実施する。
クラス不均衡に対処するデータ拡張（回転、シフト、明るさ、ズーム）を適用する。
ViT モデルを決定木(DTC)、KNN、CNN ベースライン、および以前の ViT/CNN 結果と比較する。
テストセットの精度と悪性黒色腫特異的リコールを報告し、致死性がある癌検出を強調する。

実験結果

リサーチクエスチョン

RQ1large pre-trained ViT モデル（ViT_L16、ViT_L32）が HAM10000 皮膚がんデータセットで従来の ML モデルおよび CNN ベースの手法を上回るか？
RQ2ViT_L16 および ViT_L32 の七クラス皮膚がん分類における精度と悪性黒色腫リコールはどれくらいか？
RQ3この不均衡データセットにおいてデータ拡張はモデルの性能や過学習にどのような影響を及ぼすか？
RQ4ViT モデルは本研究の他のモデルと比較して悪性黒色腫の検出により有効か？

主な発見

モデル	DTC	KNN-Classifier	ViT_L32	ViT_L16	ViT_B32*	ViT_B16*	CNN**
Accuracy	61.06%	65.45%	91.57%	92.79%	74.73%	81.88%	90.51%
Recall	24.78%	6.19%	58.54%	56.10%	41.03%	17.95%	57.57%

ViT_L32 は 91.57% の精度と悪性黒色腫リコール 58.54% を達成。
ViT_L16 は 92.79% の精度と悪性黒色腫リコール 56.10% を達成。
ViT_L16 および ViT_L32 は DTC（61.06% 精度）および KNN（65.45% 精度）を上回る。
ViT モデルは先行する CNN 結果および関連研究で報告された小型 ViT 構成を上回る。
付録のアブレーション研究では、さまざまな設計選択が精度に影響を与え、最大で 92.79% に達する。
ViT の注意機構は、ベースラインモデルと比較して病変認識能力を向上させる。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。