[论文解读] Weakly Supervised Medical Diagnosis and Localization from Multiple Resolutions
paper 介绍了一个多分辨率的弱监督框架,仅使用图像级标签即可诊断和定位胸部X光异常,并且引入了一种用于自适应显著性图的新的 LSE-LBA 池化。
Diagnostic imaging often requires the simultaneous identification of a multitude of findings of varied size and appearance. Beyond global indication of said findings, the prediction and display of localization information improves trust in and understanding of results when augmenting clinical workflow. Medical training data rarely includes more than global image-level labels as segmentations are time-consuming and expensive to collect. We introduce an approach to managing these practical constraints by applying a novel architecture which learns at multiple resolutions while generating saliency maps with weak supervision. Further, we parameterize the Log-Sum-Exp pooling function with a learnable lower-bounded adaptation (LSE-LBA) to build in a sharpness prior and better handle localizing abnormalities of different sizes using only image-level labels. Applying this approach to interpreting chest x-rays, we set the state of the art on 9 abnormalities in the NIH's CXR14 dataset while generating saliency maps with the highest resolution to date.
研究动机与目标
- Motivate using multi-resolution analysis to improve localization of varied-size chest X-ray findings.
- Develop a weakly supervised framework that produces high-resolution saliency maps without ROI annotations.
- Introduce a learnable Log-Sum-Exp pooling with lower-bounded adaptation (LSE-LBA) to handle varying lesion sizes.
- Demonstrate state-of-the-art classification and localization on NIH Chest X-ray (CXR14) dataset.
提出的方法
- Propose a multi-resolution architecture that fuses features from high and low resolutions using dense connections. - Reduce spatial resolution with ResNet blocks and preserve resolution with DenseNet-style connections. - Iteratively upsample and concatenate features across resolutions to form coarse-to-fine representations for localization.
- Use a weakly supervised MIL framework where image-level labels supervise per-instance predictions across a 2D saliency map.
- Introduce LSE-LBA pooling: p = (1/(r0+exp(beta))) * log( (1/wh) * sum exp[(r0+exp(beta)) * S_ij] ), with r = r0 + exp(beta) providing a learnable, lower-bounded sharpness prior.
- Apply sigmoid(WI_n(x)) to obtain per-instance class probabilities, then pool via LSE-LBA to yield image-level predictions and train with multi-label cross-entropy.
- Train from scratch on NIH Chest X-ray dataset with standard data augmentation and Adam optimization; evaluate using AUC per abnormality and continuous Dice for localization.
实验结果
研究问题
- RQ1Can multi-resolution, weakly supervised learning produce accurate pathology localization from only image-level labels in chest X-rays?
- RQ2Does the LSE-LBA pooling provide robust, high-resolution saliency maps across abnormalities of different sizes?
- RQ3How does multi-resolution fusion affect classification performance (AUC) and localization accuracy (Dice) on NIH CXR14?
- RQ4What is the impact of the sharpness prior parameter r0 on localization and classification across various pathologies?
主要发现
- Achieves state-of-the-art or competitive AUC on 9 of 14 abnormalities on NIH Chest X-ray test set without using localization labels or pretraining on ImageNet for most cases.
- Produces high-resolution probabilistic saliency maps showing improved localization for both focal and diffuse abnormalities as r0 increases.
- Localization performance (Dice) generally improves with a moderate sharpness prior (r0 around 5) and may degrade for very large r0 on diffuse abnormalities.
- Classification performance is robust to r0, while localization is more sensitive to r0, with best results for a balance between sharpness and coverage.
- Outperforms prior weakly supervised methods on several abnormalities and approaches or matches state-of-the-art on others without extra labeled data.
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。