QUICK REVIEW

[論文レビュー] Detecting Severity of Diabetic Retinopathy from Fundus Images: A Transformer Network-based Review

Tejas Mohan Karkera, Chandranath Adak|arXiv (Cornell University)|Jan 3, 2023

Retinal Imaging and Analysis参考文献 58被引用数 9

ひとこと要約

この論文は4つの画像トランスフォーマーモデル（ViT、BEiT、CaiT、DeiT）を組み合わせて、ファンドス画像から糖尿病性網膜症の重症度を自動的に評価し、APTOS-2019で最先端の精度を達成した。

ABSTRACT

Diabetic Retinopathy (DR) is considered one of the significant concerns worldwide, primarily due to its impact on causing vision loss among most people with diabetes. The severity of DR is typically comprehended manually by ophthalmologists from fundus photography-based retina images. This paper deals with an automated understanding of the severity stages of DR. In the literature, researchers have focused on this automation using traditional machine learning-based algorithms and convolutional architectures. However, the past works hardly focused on essential parts of the retinal image to improve the model performance. In this study, we adopt and fine-tune transformer-based learning models to capture the crucial features of retinal images for a more nuanced understanding of DR severity. Additionally, we explore the effectiveness of image transformers to infer the degree of DR severity from fundus photographs. For experiments, we utilized the publicly available APTOS-2019 blindness detection dataset, where the performances of the transformer-based models were quite encouraging.

研究の動機と目的

眼底写真からのDR重症度自動評価を動機づけ、手動グレーディングの不一致を減らす。
DRのステージングのための顕著な網膜特徴を捉えるため、トランスフォーマーベースのアーキテクチャを探究する。
複数の画像トランスフォーマーのエンサンブルを開発し、DR重症度タスクで予測性能を高める。

提案手法

入力を標準化するために眼底画像をリサイズ、オーグメンテーション、CLAHEで前処理する。
DR重症度分類のために四つの画像トランスフォーマー（ViT、BEiT、CaiT、DeiT）を適応・訓練する。
重み付き平均と多数決を用いて4つのトランスフォーマーをアンサンブルし、最終予測を生成する。
精度、カッパ、適合率、再現率、F1、特異度、バランス精度などの指標でAPTOS-2019データセット上の性能を評価する。
各トランスフォーマーとMSAヘッドの貢献を評価するためのアブレーションとハイパーパラメータ分析を実施する。

実験結果

リサーチクエスチョン

RQ1トランスフォーマーベースのモデルは眼底画像からDR重症度の特徴を効果的に学習できるか？
RQ2複数の画像トランスフォーマーをアンサンブルすることで、DR重症度グレーディングの単一モデルよりも優れた性能を得られるか？
RQ3前処理とハイパーパラメータがトランスフォーマーによるDR重症度分類に与える影響はどのようなものか？

主な発見

アンサンブル済みトランスフォーマー	加重平均精度 (%)	多数決精度 (%)
ViT	82.21
DeiT	85.65
BEiT	86.74
CaiT	86.91
ViT + DeiT	87.03	86.55
ViT + BEiT	87.48	87.03
ViT + CaiT	87.77	87.21
DeiT + BEiT	88.18	87.69
DeiT + CaiT	88.86	87.93
BEiT + CaiT	89.28	88.12
ViT + DeiT + BEiT	90.53	88.87
ViT + DeiT + CaiT	91.39	89.56
ViT + BEiT + CaiT	92.14	90.28
DeiT + BEiT + CaiT	93.46	90.91
ViT + DeiT + BEiT + CaiT	94.63	91.26

EiTを含むアンサンブル画像トランスフォーマーは、APTOS-2019テストセットで重み付き平均が94.63%、多数決が91.26%の精度を達成。
重み付き平均のEiTはCohen's kappaが0.92、バランス精度が95.75%（構成の中で最大）を達成。
単一モデルの中ではCaiTが最良だったが、アンサンブルは個々のモデルを上回る。
重症度クラス全体で、EiTはネガティブDR（クラス0）の高い適合率と再現率を示し、他の段階では性能が変動するが全体として高い特異度を維持。
ハイパーパラメータ分析ではMSAヘッドを最大6まで増やすと性能が向上し、重み付き平均のアルファ重みを調整すると最高精度は94.63%になる。
従来のCNNベースのアーキテクチャや他のトランスフォーマーと比較して、EiTは精度、バランス精度、感度、特異度で上回っている。」],
table_headers:[
Ensembled Transformers
Weighted Mean Accuracy (%)
Majority Voting Accuracy (%)

Figure 3: Internal view of a transformer encoder (TE).

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。