[论文解读] Evaluation of Algorithms for Multi-Modality Whole Heart Segmentation: An Open-Access Grand Challenge
本文介绍了多模态全心分割(MM-WHS)挑战赛,这是一项开放获取的大型挑战赛,评估了21种算法在120份临床CT和MRI影像中的全心分割性能,包含人工勾画的参考分割。尽管训练数据有限,深度学习方法仍取得了高精度;而传统多图谱方法则表现出更高的鲁棒性,凸显了结合学习与解剖先验的混合模型的必要性。
Knowledge of whole heart anatomy is a prerequisite for many clinical applications. Whole heart segmentation (WHS), which delineates substructures of the heart, can be very valuable for modeling and analysis of the anatomy and functions of the heart. However, automating this segmentation can be arduous due to the large variation of the heart shape, and different image qualities of the clinical data. To achieve this goal, a set of training data is generally needed for constructing priors or for training. In addition, it is difficult to perform comparisons between different methods, largely due to differences in the datasets and evaluation metrics used. This manuscript presents the methodologies and evaluation results for the WHS algorithms selected from the submissions to the Multi-Modality Whole Heart Segmentation (MM-WHS) challenge, in conjunction with MICCAI 2017. The challenge provides 120 three-dimensional cardiac images covering the whole heart, including 60 CT and 60 MRI volumes, all acquired in clinical environments with manual delineation. Ten algorithms for CT data and eleven algorithms for MRI data, submitted from twelve groups, have been evaluated. The results show that many of the deep learning (DL) based methods achieved high accuracy, even though the number of training datasets was limited. A number of them also reported poor results in the blinded evaluation, probably due to overfitting in their training. The conventional algorithms, mainly based on multi-atlas segmentation, demonstrated robust and stable performance, even though the accuracy is not as good as the best DL method in CT segmentation. The challenge, including the provision of the annotated training data and the blinded evaluation for submitted algorithms on the test data, continues as an ongoing benchmarking resource via its homepage (\url{www.sdspeople.fudan.edu.cn/zhuangxiahai/0/mmwhs/}).
研究动机与目标
- 建立一个基于临床CT和MRI数据的标准化、开放获取的多模态全心分割(WHS)基准。
- 在相同的训练与测试条件下评估多种WHS算法的性能,以确保公平比较。
- 识别深度学习方法与传统方法在处理解剖变异性和图像质量差异方面的优缺点。
- 提供一个公开可用的数据集,包含专家人工勾画的分割结果,以支持未来算法的开发与验证。
- 通过提供盲评框架和持续开放的训练与测试数据,推动可复现研究。
提出的方法
- 挑战赛使用了120个体素化的全心三维影像(60份CT,60份MRI),均来自临床环境,并配有专家人工勾画的边界。
- 所有参与算法均基于同一开放获取数据集进行训练,并在盲测数据集上进行评估,以确保公平性与可复现性。
- 评估采用Dice相似系数(DSC)和豪斯多夫距离(HD)来衡量8个心脏亚结构的分割精度。
- 方法包括深度学习(如U-Net变体)、多图谱分割,以及融合形状先验或多模态信息的混合方法。
- 评估框架在线托管,至今仍保持活跃,支持未来提交与比较。
- 参与者提交了结果与详细的算法描述,支持方法学分析与可复现性验证。
实验结果
研究问题
- RQ1与传统多图谱方法相比,基于深度学习的WHS方法在临床CT和MRI数据上的表现如何?
- RQ2有限的训练数据对深度学习模型在WHS中泛化能力与鲁棒性有何影响?
- RQ3为何某些深度学习模型在Dice分数较高的情况下仍会产生不切实际的形状?
- RQ4图像质量差异与解剖形态多样性如何影响不同模态下的分割性能?
- RQ5结合深度学习与解剖先验的混合方法能否提升分割的稳定性和准确性?
主要发现
- 基于深度学习的方法取得了高分割精度,表现最佳的模型(如GUT、UB1*、UB2*)在CT数据上对左心室和右心室的平均Dice分数超过0.90。
- 尽管性能优异,部分深度学习模型在盲评中表现出较差的泛化能力,产生不切实际的形状,可能源于小样本数据集上的过拟合。
- 传统多图谱分割方法在不同受试者间表现出更稳定、一致的性能,尽管其精度略低于最佳深度学习模型。
- 所有方法对四个心腔(LV、RV、LA、RA)的分割总体准确,而大血管(AO、PA)仍具挑战性,尤其在MRI中更为明显。
- 挑战赛揭示,与CT相比,MRI的WHS更具挑战性,主要由于图像质量较低、对比度不一致以及解剖变异更大。
- 开放获取的数据集与评估平台持续作为未来研究的基准,结果与数据公开,供持续的算法开发与比较。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。