Skip to main content
QUICK REVIEW

[论文解读] A Robust Ensemble Algorithm for Ischemic Stroke Lesion Segmentation: Generalizability and Clinical Utility Beyond the ISLES Challenge

Ezequiel de la Rosa, Mauricio Reyes|arXiv (Cornell University)|Mar 28, 2024
Acute Ischemic Stroke Management被引用 8
一句话总结

作者构建了一个基于ISLES’22前沿提交的强大集成,用于在扩散加权MRI上检测和分割缺血性中风病灶,达到可泛化到中心、病灶大小和中风模式的最先进准确性,并在挑战之外具临床相关性。

ABSTRACT

Diffusion-weighted MRI (DWI) is essential for stroke diagnosis, treatment decisions, and prognosis. However, image and disease variability hinder the development of generalizable AI algorithms with clinical value. We address this gap by presenting a novel ensemble algorithm derived from the 2022 Ischemic Stroke Lesion Segmentation (ISLES) challenge. ISLES'22 provided 400 patient scans with ischemic stroke from various medical centers, facilitating the development of a wide range of cutting-edge segmentation algorithms by the research community. Through collaboration with leading teams, we combined top-performing algorithms into an ensemble model that overcomes the limitations of individual solutions. Our ensemble model achieved superior ischemic lesion detection and segmentation accuracy on our internal test set compared to individual algorithms. This accuracy generalized well across diverse image and disease variables. Furthermore, the model excelled in extracting clinical biomarkers. Notably, in a Turing-like test, neuroradiologists consistently preferred the algorithm's segmentations over manual expert efforts, highlighting increased comprehensiveness and precision. Validation using a real-world external dataset (N=1686) confirmed the model's generalizability. The algorithm's outputs also demonstrated strong correlations with clinical scores (admission NIHSS and 90-day mRS) on par with or exceeding expert-derived results, underlining its clinical relevance. This study offers two key findings. First, we present an ensemble algorithm (https://github.com/Tabrisrei/ISLES22_Ensemble) that detects and segments ischemic stroke lesions on DWI across diverse scenarios on par with expert (neuro)radiologists. Second, we show the potential for biomedical challenge outputs to extend beyond the challenge's initial objectives, demonstrating their real-world clinical applicability.

研究动机与目标

  • 动机:在多样化数据集上实现对缺血性中风病灶分割的通用AI的需求。
  • 利用ISLES’22挑战结果构建强健的集成,以克服各单个方法的偏差。
  • 展示对未见中心、不同病灶大小以及不同中风模式和血管区域的泛化能力。
  • 通过与NIHSS和90天mRS的相关性以及图灵式读者偏好测试来评估临床效用。

提出的方法

  • 从ISLES’22顶级团队(SEALS、NVAUTO、SWAN)构建集成。
  • 使用ISLES’22数据进行训练和验证,包含保留的测试集和外部真实世界数据集(N=1686)。
  • 对图像进行预处理(重采样至1x1x1 mm3,z-score归一化),使用DWI/ADC/FLAIR输入;应用交叉验证和模型集成。
  • 使用Dice、病灶级F1、绝对体积差(AVD)和绝对病灶数差(ALD)进行评估。
  • 评估中风亚组(模式和血管区域)并进行主观的图灵式神经放射科医生评估。
  • 将分割输出与临床评分(入院NIHSS、90天mRS)相关联。
Figure 1: Overview of the ISLES’22 challenge and post-challenge experimental design, including the developed algorithmic solutions. A) Challenge and post-challenge phases and datasets. B) Summary of algorithmic solutions stratified by network architecture, loss function, and input modalities. C) Cha
Figure 1: Overview of the ISLES’22 challenge and post-challenge experimental design, including the developed algorithmic solutions. A) Challenge and post-challenge phases and datasets. B) Summary of algorithmic solutions stratified by network architecture, loss function, and input modalities. C) Cha

实验结果

研究问题

  • RQ1一个来自挑战的集成能否泛化到未见的成像中心和真实世界数据?
  • RQ2在病灶大小、卒中阶段和卒中模式亚组中,集成的表现如何?
  • RQ3集成是否能高精度识别受影响的血管区域和中风亚型?
  • RQ4在图灵式测试中,临床医生是否更偏好该集成的分割结果而不是人工专家勾画?
  • RQ5在外部数据上,分割输出是否与关键临床结果(NIHSS、90天mRS)相关?

主要发现

  • 该集成在未见的ISLES’22测试数据上取得领先表现(中位Dice 0.82,中位病灶级F1 0.86)。
  • 外部真实世界数据(N=1686)显示出类似的泛化能力(中位Dice 0.82,中位病灶级F1 0.86)。
  • 病灶大小分析显示体积一致性高(总体Pearson r = 0.98;<5 ml r = 0.87;5–20 ml r = 0.90;≥20 ml r = 0.96)。
  • 在中心泛化(未见中心)和跨卒中阶段方面表现鲁棒,由于阶段相关因素,急性与亚急性扫描之间存在一些Dice分数差异。
  • 在中风模式分类(平衡准确度86.9%对比最佳单一方案78.9%)和血管区域识别(平衡准确度97.6%)方面,集成优于单个挑战解。
  • 在图灵式评估中,神经放射科医生更偏好该集成的分割结果,而非人工专家勾画。
Figure 2: Performance for the participating teams in the unseen test phase of the challenge. Teams are displayed in red and in decreasing order based on their final rank. DSC: Dice Similarity Coefficient; F1 score: lesion-wise F1 score; AVD: absolute volume difference; ALD: absolute lesion count dif
Figure 2: Performance for the participating teams in the unseen test phase of the challenge. Teams are displayed in red and in decreasing order based on their final rank. DSC: Dice Similarity Coefficient; F1 score: lesion-wise F1 score; AVD: absolute volume difference; ALD: absolute lesion count dif

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。