[Paper Review] Deep-learning models improve on community-level diagnosis for common congenital heart disease lesions
This study develops and evaluates deep learning models for automated prenatal diagnosis of common congenital heart defects—tetralogy of Fallot (TOF) and hypoplastic left heart syndrome (HLHS)—using fetal echocardiograms. The models achieve high accuracy in identifying key cardiac views (F-score: 0.95), detecting TOF (75% sensitivity, 76% specificity) and HLHS (100% sensitivity, 90% specificity), significantly outperforming community-level diagnostic rates.
Prenatal diagnosis of tetralogy of Fallot (TOF) and hypoplastic left heart syndrome (HLHS), two serious congenital heart defects, improves outcomes and can in some cases facilitate in utero interventions. In practice, however, the fetal diagnosis rate for these lesions is only 30-50 percent in community settings. Improving fetal diagnosis of congenital heart disease is therefore critical. Deep learning is a cutting-edge machine learning technique for finding patterns in images but has not yet been applied to prenatal diagnosis of congenital heart disease. Using 685 retrospectively collected echocardiograms from fetuses 18-24 weeks of gestational age from 2000-2018, we trained convolutional and fully-convolutional deep learning models in a supervised manner to (i) identify the five canonical screening views of the fetal heart and (ii) segment cardiac structures to calculate fetal cardiac biometrics. We then trained models to distinguish by view between normal hearts, TOF, and HLHS. In a holdout test set of images, F-score for identification of the five most important fetal cardiac views was 0.95. Binary classification of unannotated cardiac views of normal heart vs. TOF reached an overall sensitivity of 75% and a specificity of 76%, while normal vs. HLHS reached a sensitivity of 100% and specificity of 90%, both well above average diagnostic rates for these lesions. Furthermore, segmentation-based measurements for cardiothoracic ratio (CTR), cardiac axis (CA), and ventricular fractional area change (FAC) were compatible with clinically measured metrics for normal, TOF, and HLHS hearts. Thus, using guideline-recommended imaging, deep learning models can significantly improve detection of fetal congenital heart disease compared to the common standard of care.
Motivation & Objective
- To improve prenatal detection rates of serious congenital heart defects like TOF and HLHS, which are currently diagnosed in only 30–50% of cases in community settings.
- To develop deep learning models that can automatically identify standard fetal cardiac views from echocardiograms using supervised training.
- To assess the performance of these models in distinguishing normal hearts from TOF and HLHS using both classification and segmentation-based biometric measurements.
- To evaluate whether deep learning can enhance diagnostic accuracy beyond current clinical practice in routine community-level settings.
Proposed method
- Retrospective analysis of 685 fetal echocardiograms from 18–24 weeks' gestation collected between 2000 and 2018.
- Training of convolutional and fully-convolutional neural networks to detect the five canonical fetal cardiac screening views.
- Use of supervised learning to classify unannotated cardiac views as normal, TOF, or HLHS.
- Implementation of semantic segmentation to measure key biometrics: cardiothoracic ratio (CTR), cardiac axis (CA), and ventricular fractional area change (FAC).
- Validation on a holdout test set to evaluate F-score, sensitivity, and specificity for lesion detection.
- Comparison of model-derived biometrics with clinically measured values to assess compatibility.
Experimental results
Research questions
- RQ1Can deep learning models accurately identify the five standard fetal cardiac views from echocardiographic images?
- RQ2What is the diagnostic performance of deep learning models in distinguishing normal hearts from those with TOF or HLHS?
- RQ3How do segmentation-based biometric measurements (CTR, CA, FAC) from deep learning compare to clinically measured values?
- RQ4Can deep learning models achieve higher sensitivity and specificity for TOF and HLHS than current community-level diagnostic rates?
- RQ5To what extent can deep learning improve early detection of congenital heart disease in routine clinical practice?
Key findings
- The deep learning model achieved an F-score of 0.95 in identifying the five most important fetal cardiac views from echocardiograms.
- Binary classification of normal vs. TOF reached 75% sensitivity and 76% specificity, exceeding typical community-level diagnostic performance.
- Classification of normal vs. HLHS achieved 100% sensitivity and 90% specificity, demonstrating superior detection capability.
- Segmentation-based measurements of CTR, CA, and FAC were compatible with clinically measured metrics for normal, TOF, and HLHS hearts.
- The model’s performance in detecting HLHS was particularly strong, with perfect sensitivity, indicating high potential for early intervention.
- Overall, the deep learning approach significantly outperforms standard community-level diagnosis for common congenital heart disease lesions.
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.