QUICK REVIEW

[論文レビュー] Diabetic Retinopathy Detection via Deep Convolutional Networks for Discriminative Localization and Visual Explanation

Zhiguang Wang, Jianbo Yang|arXiv (Cornell University)|Mar 31, 2017

Retinal Imaging and Analysis参考文献 20被引用数 95

ひとこと要約

グローバル平均プーリングとRAM（Regression Activation Maps）を用いたCNNベースの糖尿病網膜症DR検出モデルで、顕著な領域の局在化と視覚的説明を提供し、パラメータ数を減らして競争力のある性能を達成する。

ABSTRACT

We proposed a deep learning method for interpretable diabetic retinopathy (DR) detection. The visual-interpretable feature of the proposed method is achieved by adding the regression activation map (RAM) after the global averaging pooling layer of the convolutional networks (CNN). With RAM, the proposed model can localize the discriminative regions of an retina image to show the specific region of interest in terms of its severity level. We believe this advantage of the proposed deep learning model is highly desired for DR detection because in practice, users are not only interested with high prediction performance, but also keen to understand the insights of DR detection and why the adopted learning model works. In the experiments conducted on a large scale of retina image dataset, we show that the proposed CNN model can achieve high performance on DR detection compared with the state-of-the-art while achieving the merits of providing the RAM to highlight the salient regions of the input image.

研究の動機と目的

高い予測精度を超えた解釈可能な自動 DR 検出を動機づける。
予測の視覚的説明を可能にするパラメータを削減したCNNアーキテクチャを開発する。
RAMを用いてDR重症度に寄与する識別的網膜領域を局在化する。
大規模なKaggle DRデータセットで評価し、最先端ベンチマークと比較する。

提案手法

最後の畳み込み層と出力を結ぶために全結合層を用いず、グローバル平均プーリングに依存するCNNを用いる。
Regression Activation Maps (RAM)を、最終層の特徴マップの重み付き和として導入し、予測領域を局在化する。
DR重症度スコアの回帰を目的として平均二乗誤差損失でネットワークを訓練する。
複数の入力解像度でRAMを生成し、それらを融合して局在化を改善する。
Kaggleベンチマーク手法と性能を比較し、パラメータ数と訓練時間を報告する。

実験結果

リサーチクエスチョン

RQ1RAMはDR重症度予測に有意義な視覚的説明を提供できるか。
RQ2GAPを用いた全結合層の削除が、パラメータを削減しつつ予測性能を維持するか。
RQ3複数の入力解像度からのRAMを融合することで局在化と精度が向上するか。

主な発見

Metric	Baseline	Ours
Kappa score (Public Leaderboard)	0.8542	0.85034
Kappa score (Private Leaderboard)	0.8448	0.8412
Parameter # (net-5)	12.4M	9.7M
Training time (second/epoch)	422.1	367.3
Parameter # (net-4)	12.5M	9.8M
Training time (second/epoch)	451.7	398.2
RAM	No	Yes

RAMはDR重症度レベルに対応する識別的な網膜領域の局在化を可能にする。
提案手法はベンチマークと比較して競争力のκ係数を達成しつつ、パラメータを約22%削減する。
より大きい入力画像サイズは512ピクセルまで予測性能を改善するが、それを超えると顕著な向上はない。
128ピクセルと256ピクセル入力からのRAMの融合は、より包括的なROIsと病理との整合性向上をもたらす。
RAMの可視化は臨床的に関連する特徴（例：微小動脈瘤、血管変化）を明らかにし、モデルの決定根拠を透明に示す。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。