Skip to main content
QUICK REVIEW

[Paper Review] HiResCAM: Faithful Location Representation in Visual Attention for Explainable 3D Medical Image Classification

Rachel Lea Draelos, Lawrence Carin|arXiv (Cornell University)|Nov 17, 2020
Radiomics and Machine Learning in Medical Imaging36 citations
TL;DR

HiResCAM is a novel, label-specific attention mechanism that guarantees faithful localization of features used by a 3D CNN for multilabel classification in medical imaging, overcoming Grad-CAM's gradient averaging flaw. It achieves a 37% improvement in weakly supervised organ localization on RAD-ChestCT, setting a new state of the art.

ABSTRACT

Understanding model predictions is critical in healthcare, to facilitate rapid verification of model correctness and to guard against the use of models that exploit confounding variables. Here we address the challenging new task of explainable multilabel classification of volumetric medical images. We first illustrate a previously unrecognized limitation of the popular model explanation method Grad-CAM: as a side effect of the gradient averaging step, Grad-CAM sometimes highlights the wrong location. To solve this problem, we propose HiResCAM, a novel label-specific attention mechanism that is provably guaranteed to highlight only the locations the model used to make each prediction. Next, we introduce a mask loss that leverages HiResCAM to encourage the model to predict abnormalities based only on the organs in which those abnormalities appear. Our innovations produce a 37% improvement in weakly supervised organ localization of multiple abnormalities in the RAD-ChestCT data set of 36,316 CT volumes, resulting in state-of-the-art performance. We also demonstrate on PASCAL VOC 2012 the different properties of HiResCAM and Grad-CAM on natural images. Overall, this work advances convolutional neural network explanation approaches and the clinical applicability of multiple abnormality modeling in volumetric medical images.

Motivation & Objective

  • To address the unreliability of Grad-CAM in localizing relevant features in 3D medical images due to gradient averaging.
  • To develop a method that ensures attention maps reflect only the true decision-relevant regions for each label.
  • To improve weakly supervised localization of multiple abnormalities in volumetric CT scans.
  • To enhance clinical trust in deep learning models by ensuring explanations are faithful and interpretable.
  • To establish a new benchmark for explainable multilabel classification in 3D medical imaging.

Proposed method

  • Propose HiResCAM, a label-specific attention mechanism that computes gradients per class and applies them directly to feature maps without averaging across classes.
  • Introduce a mask loss that encourages the model to attend only to organs containing the predicted abnormalities, improving localization fidelity.
  • Train the model end-to-end with the mask loss to enforce attention on relevant anatomical regions per label.
  • Use a gradient-based saliency method that preserves spatial resolution and avoids suppression of salient features due to averaging.
  • Apply the method to 3D volumetric CT data and evaluate on both medical and natural image benchmarks.
  • Validate the method on RAD-ChestCT and PASCAL VOC 2012 to compare with Grad-CAM and assess generalization.

Experimental results

Research questions

  • RQ1Does Grad-CAM produce misleading attention maps due to gradient averaging in 3D medical images?
  • RQ2Can a label-specific attention mechanism ensure faithful localization of decision-relevant regions in multilabel 3D classification?
  • RQ3Does introducing a mask loss that constrains attention to organ-specific regions improve weakly supervised localization performance?
  • RQ4How does HiResCAM compare to Grad-CAM in terms of localization accuracy and faithfulness on natural and medical images?
  • RQ5Can the proposed method achieve state-of-the-art performance in weakly supervised organ localization of multiple abnormalities?

Key findings

  • HiResCAM successfully eliminates the misleading localization artifacts caused by gradient averaging in Grad-CAM.
  • The method achieves a 37% relative improvement in weakly supervised organ localization performance on the RAD-ChestCT dataset of 36,316 CT volumes.
  • HiResCAM produces more faithful and localized attention maps compared to Grad-CAM, especially in complex 3D medical volumes.
  • The mask loss effectively encourages the model to attend only to organs containing the predicted abnormalities, improving localization fidelity.
  • On PASCAL VOC 2012, HiResCAM demonstrates distinct and more accurate localization behavior than Grad-CAM, confirming its superiority in feature attribution.
  • The approach sets a new state of the art in explainable multilabel classification of volumetric medical images.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.