QUICK REVIEW

[論文レビュー] Skin Lesion Analysis Toward Melanoma Detection 2018: A Challenge Hosted by the International Skin Imaging Collaboration (ISIC)

Noel Codella, Veronica Rotemberg|arXiv (Cornell University)|Feb 9, 2019

Cutaneous Melanoma Detection and Management参考文献 7被引用数 984

ひとこと要約

この論文は melanoma detection のための ISIC 2018 Challenge on Skin Lesion Analysis を要約し、データセット、タスク、評価プロトコル、結果、および generalization と regulation に関する示唆を詳述します。

ABSTRACT

This work summarizes the results of the largest skin image analysis challenge in the world, hosted by the International Skin Imaging Collaboration (ISIC), a global partnership that has organized the world's largest public repository of dermoscopic images of skin. The challenge was hosted in 2018 at the Medical Image Computing and Computer Assisted Intervention (MICCAI) conference in Granada, Spain. The dataset included over 12,500 images across 3 tasks. 900 users registered for data download, 115 submitted to the lesion segmentation task, 25 submitted to the lesion attribute detection task, and 159 submitted to the disease classification task. Novel evaluation protocols were established, including a new test for segmentation algorithm performance, and a test for algorithm ability to generalize. Results show that top segmentation algorithms still fail on over 10% of images on average, and algorithms with equal performance on test data can have different abilities to generalize. This is an important consideration for agencies regulating the growing set of machine learning tools in the healthcare domain, and sets a new standard for future public challenges in healthcare.

研究の動機と目的

ISIC 2018 Challenge の設計と参加指標を提示する。
Thresholded Jaccard および balanced accuracy を含む新しい評価プロトコルを導入する。
内部および外部のテスト分割を用いて一般化を評価する。
セグメンテーション、属性検出、および疾病分類のタスク全体の結果を分析する。
医療機械学習の今後の公開チャレンジに向けた推奨を提供する。

提案手法

チャレンジをセグメンテーション、属性検出、疾病分類の3つのタスクに分割する。
セグメンテーションにおける観察者間のばらつきを考慮するため Thresholded Jaccard を使用する。
分類における発生率バイアスを緩和するため balanced accuracy を使用する。
一般化を評価するために内部および外部の保持テスト分割を含める。
方法を説明する4ページの原稿を提供し、ドメイン内データまたはドメイン外データの使用を開示する。
タスク固有の指標を用いてセグメンテーション、属性検出、および分類を評価する。

実験結果

リサーチクエスチョン

RQ1新しい評価プロトコル下でセグメンテーション、属性検出、および疾病分類はどのように性能を示すか？
RQ2Thresholded Jaccard はセグメンテーションにおいて Jaccard より臨床的有用性をより反映するか？
RQ3Balanced accuracy はランキングと一般化を他の指標と比較してどのように影響するか？
RQ4アルゴリズムは内部データから外部データの分割へ一般化できるか？
RQ5限定的な属性検出性能が臨床実践および今後の課題にどのような影響を与えるか？

主な発見

Top segmentation submissions reached around 0.80 Thresholded Jaccard but still fail on over 10% of images.
Attribute detection performance was low, with best average Jaccard around 0.473 per attribute.
Highest disease classification balanced accuracy was 0.885, with notable internal-external generalization gaps.
Algorithms often overfit to internal data; generalization varied across methods.
Balanced accuracy significantly impacts participant ranking compared with accuracy or AUC.
External test data revealed differences in performance not captured by internal test datasets.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。