[论文解读] Diabetic Retinopathy Detection via Deep Convolutional Networks for Discriminative Localization and Visual Explanation
基于 CNN 的 DR 检测模型,利用全局平均池化和 RAM(Regression Activation Maps)来提供显著区域的定位和可视化解释,在参数更少的情况下实现具有竞争力的性能。
We proposed a deep learning method for interpretable diabetic retinopathy (DR) detection. The visual-interpretable feature of the proposed method is achieved by adding the regression activation map (RAM) after the global averaging pooling layer of the convolutional networks (CNN). With RAM, the proposed model can localize the discriminative regions of an retina image to show the specific region of interest in terms of its severity level. We believe this advantage of the proposed deep learning model is highly desired for DR detection because in practice, users are not only interested with high prediction performance, but also keen to understand the insights of DR detection and why the adopted learning model works. In the experiments conducted on a large scale of retina image dataset, we show that the proposed CNN model can achieve high performance on DR detection compared with the state-of-the-art while achieving the merits of providing the RAM to highlight the salient regions of the input image.
研究动机与目标
- Motivate interpretable automated DR detection beyond high predictive accuracy.
- Develop a CNN architecture with reduced parameters that enables visual explanations of predictions.
- Localize discriminative retinal regions contributing to DR severity via RAM.
- Evaluate on a large Kaggle DR dataset and compare with a state-of-the-art benchmark.
提出的方法
- Use CNNs without fully connected layers, relying on global average pooling to connect last conv layer to output.
- Introduce Regression Activation Maps (RAM) as a weighted sum of last-layer feature maps to localize predictive regions.
- Train networks under mean squared error loss for regression of DR severity scores.
- Generate RAMs at multiple input resolutions and fuse them to improve localization.
- Compare performance to a Kaggle benchmark method and report parameter counts and training times.
实验结果
研究问题
- RQ1Can RAM provide meaningful visual explanations for DR severity predictions?
- RQ2Does removing fully connected layers with GAP preserve predictive performance while reducing parameters?
- RQ3Does fusing RAMs from multiple input resolutions improve localization and accuracy?
主要发现
| Metric | Baseline | Ours |
|---|---|---|
| Kappa score (Public Leaderboard) | 0.8542 | 0.85034 |
| Kappa score (Private Leaderboard) | 0.8448 | 0.8412 |
| Parameter # (net-5) | 12.4M | 9.7M |
| Training time (second/epoch) | 422.1 | 367.3 |
| Parameter # (net-4) | 12.5M | 9.8M |
| Training time (second/epoch) | 451.7 | 398.2 |
| RAM | No | Yes |
- RAM enables localization of discriminative retinal regions corresponding to DR severity levels.
- The proposed approach achieves competitive Kappa scores compared with the benchmark while reducing parameters by ~22%.
- Larger input image sizes improve prediction performance up to 512 pixels without substantial gains beyond that.
- Fusion of RAMs from 128 and 256 pixel inputs yields more comprehensive ROIs and better alignment with pathology.
- The RAM visualizations reveal clinically relevant features (e.g., microaneurysms, vessel changes) and provide transparent model decision evidence.
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。