[Paper Review] POTHER: Patch-Voted Deep Learning-Based Chest X-ray Bias Analysis for COVID-19 Detection
POTHER proposes a multi-task, patch-voted deep learning framework for explainable COVID-19 detection in chest X-rays, using lung segmentation with attention mechanisms to limit patch sampling to lung regions and reduce reliance on confounding factors like ECG leads and lateral markers. The method achieves a high F1 score of 0.974 for COVID-19 classification while demonstrating robustness to biases through explainable AI analysis.
A critical step in the fight against COVID-19, which continues to have a catastrophic impact on peoples lives, is the effective screening of patients presented in the clinics with severe COVID-19 symptoms. Chest radiography is one of the promising screening approaches. Many studies reported detecting COVID-19 in chest X-rays accurately using deep learning. A serious limitation of many published approaches is insufficient attention paid to explaining decisions made by deep learning models. Using explainable artificial intelligence methods, we demonstrate that model decisions may rely on confounding factors rather than medical pathology. After an analysis of potential confounding factors found on chest X-ray images, we propose a novel method to minimise their negative impact. We show that our proposed method is more robust than previous attempts to counter confounding factors such as ECG leads in chest X-rays that often influence model classification decisions. In addition to being robust, our method achieves results comparable to the state-of-the-art. The source code and pre-trained weights are publicly available at (https://github.com/tomek1911/POTHER).
Motivation & Objective
- To address the critical issue of confounding biases in deep learning models for COVID-19 detection from chest X-rays, such as ECG leads, lateral markers, and hospital-specific markings.
- To develop a robust, explainable AI method that minimizes reliance on non-pathological features while maintaining high diagnostic performance.
- To improve model generalization by focusing on lung-adjacent patches through a multi-task learning framework with segmentation and classification.
Proposed method
- A U-Net-based encoder-decoder network with attention mechanisms is used to generate pseudo-segmentation masks for lung fields from raw CXR images.
- Patches are cropped only from the lung region and its immediate vicinity, reducing exposure to confounding artifacts like ECG leads and lateral markers.
- A multi-task learning setup jointly optimizes for lung segmentation and patch-level classification, enhancing feature representation.
- Patch-level predictions are aggregated via majority voting to produce the final image-level classification.
- Explainable AI techniques, including patch-based activation maps and GradCAM, are used to visualize attention and validate model decisions.
- A mask filtration algorithm is applied to refine segmentation outputs and ensure only relevant regions contribute to patch sampling.
Experimental results
Research questions
- RQ1To what extent do deep learning models for COVID-19 detection in CXRs rely on confounding biases such as ECG leads and lateral markers rather than actual pathological features?
- RQ2Can a patch-voted, multi-task learning approach reduce model sensitivity to known confounding factors in the COVIDx dataset?
- RQ3How does limiting patch sampling to lung-adjacent regions affect model robustness and performance compared to global feature-based methods?
- RQ4Can attention-based segmentation improve feature extraction and classification accuracy when training data is limited to lung fragments?
- RQ5Does the proposed method maintain high performance while reducing reliance on non-medical artifacts in CXR images?
Key findings
- POTHER achieved an F1 score of 0.974 for the COVID-19 class on the COVIDx test set, outperforming other models in F1 despite comparable accuracy.
- The model demonstrated reduced sensitivity to confounding biases such as ECG leads and lateral markers (e.g., 'L' or 'R' labels), as confirmed by activation map analysis.
- The use of attention-enhanced segmentation and localized patch sampling significantly improved robustness compared to global feature-based models.
- The method achieved a precision of 1.000 and recall of 0.950 for the COVID-19 class, indicating high confidence and completeness in detection.
- Explainable AI analysis revealed that the model’s attention was primarily focused on lung regions and pathological patterns, not on irrelevant markers or artifacts.
- The source code and pre-trained weights are publicly available at https://github.com/tomek1911/POTHER, supporting reproducibility and further research.
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.