[论文解读] SAR-U-Net: squeeze-and-excitation block and atrous spatial pyramid pooling based residual U-Net for automatic liver CT segmentation.
该论文提出SAR-U-Net,一种2D U-Net变体,通过引入挤压-激励(SE)模块实现注意力机制的特征重校准,采用空洞空间金字塔池化(ASPP)实现多尺度上下文聚合,并结合残差学习以支持深层网络训练。该模型在LiTS17和SLiver07数据集上达到最先进性能,Dice分数分别为95.71%和97.31%,在具有挑战性的肝脏CT分割场景中展现出卓越的准确性和鲁棒性。
Background and objective: In this paper, a modified U-Net based framework is presented, which leverages techniques from Squeeze-and-Excitation (SE) block, Atrous Spatial Pyramid Pooling (ASPP) and residual learning for accurate and robust liver CT segmentation, and the effectiveness of the proposed method was tested on two public datasets LiTS17 and SLiver07. Methods: A new network architecture called SAR-U-Net was designed. Firstly, the SE block is introduced to adaptively extract image features after each convolution in the U-Net encoder, while suppressing irrelevant regions, and highlighting features of specific segmentation task; Secondly, ASPP was employed to replace the transition layer and the output layer, and acquire multi-scale image information via different receptive fields. Thirdly, to alleviate the degradation problem, the traditional convolution block was replaced with the residual block and thus prompt the network to gain accuracy from considerably increased depth. Results: In the LiTS17 experiment, the mean values of Dice, VOE, RVD, ASD and MSD were 95.71, 9.52, -0.84, 1.54 and 29.14, respectively. Compared with other closely related 2D-based models, the proposed method achieved the highest accuracy. In the experiment of the SLiver07, the mean values of Dice, VOE, RVD, ASD and MSD were 97.31, 5.37, -1.08, 1.85 and 27.45, respectively. Compared with other closely related models, the proposed method achieved the highest segmentation accuracy except for the RVD. Conclusion: The proposed model enables a great improvement on the accuracy compared to 2D-based models, and its robustness in circumvent challenging problems, such as small liver regions, discontinuous liver regions, and fuzzy liver boundaries, is also well demonstrated and validated.
研究动机与目标
- 提升CT扫描中肝脏自动分割的准确性和鲁棒性,尤其在小范围、不连续或边界模糊的肝脏区域中表现更优。
- 解决标准U-Net在捕捉多尺度上下文特征以及抑制无关特征方面的局限性。
- 通过缓解极深网络架构中常见的退化问题,提升深层网络的训练效果。
- 在公开基准数据集LiTS17和SLiver07上验证所提架构的有效性。
提出的方法
- 在U-Net编码器的每个卷积层后集成挤压-激励(SE)模块,通过强调与任务相关的通道,自适应地重校准特征图。
- 用空洞空间金字塔池化(ASPP)替代标准的下采样和输出层,通过不同空洞率的并行空洞卷积捕捉多尺度上下文信息。
- 用残差块替代标准卷积块,以支持更深的网络结构,并缓解训练过程中的梯度消失问题。
- 将SE模块、ASPP和残差学习三者整合为统一的基于U-Net的架构,命名为SAR-U-Net,实现端到端的肝脏分割。
- 在两个公开的肝脏CT数据集上,采用标准监督学习方法,结合交叉熵损失和Dice损失进行网络训练。
实验结果
研究问题
- RQ1SE模块的引入是否能通过聚焦相关通道,提升肝脏CT分割中的特征表示能力?
- RQ2ASPP是否通过捕捉肝脏CT图像中的多尺度上下文特征,提升分割性能?
- RQ3残差学习是否能实现更深且更准确的U-Net架构,同时避免网络退化?
- RQ4在具有挑战性的肝脏CT病例中,SAR-U-Net相较于现有2D模型在分割准确性和鲁棒性方面表现如何?
主要发现
- 在LiTS17数据集中,SAR-U-Net实现了95.71%的平均Dice分数,优于其他2D基线模型,分割精度更高。
- 在LiTS17数据集中,模型的平均VOE为9.52%,RVD为-0.84%,ASD为1.54 mm,MSD为29.14 mm,表明分割结果重叠度高且表面距离误差小。
- 在SLiver07数据集中,SAR-U-Net在可比模型中取得了最高的Dice分数97.31%,VOE为5.37%,RVD为-1.08%,ASD为1.85 mm,MSD为27.45 mm。
- 该模型在处理小范围肝脏区域、不连续肝脏结构以及边界模糊等具有挑战性的病例时表现出强大的鲁棒性。
- 与标准U-Net及相关架构相比,SE、ASPP与残差学习的联合使用显著提升了分割性能。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。