QUICK REVIEW

[论文解读] DS-TransUNet:Dual Swin Transformer U-Net for Medical Image Segmentation

Ailiang Lin, Bingzhi Chen|arXiv (Cornell University)|Jun 12, 2021

Advanced Neural Network Applications被引用 41

一句话总结

DS-TransUNet 在 U 形结构中引入双尺度 Swin Transformer 编码器和 Transformer Interactive Fusion 模块，以捕捉医学图像分割中的远程依赖和多尺度上下文，在包括息肉分割、ISIC 2018、GLAS 和 2018 DS Bowl 在内的多个数据集上达到最先进的结果。

ABSTRACT

Automatic medical image segmentation has made great progress benefit from the development of deep learning. However, most existing methods are based on convolutional neural networks (CNNs), which fail to build long-range dependencies and global context connections due to the limitation of receptive field in convolution operation. Inspired by the success of Transformer in modeling the long-range contextual information, some researchers have expended considerable efforts in designing the robust variants of Transformer-based U-Net. Moreover, the patch division used in vision transformers usually ignores the pixel-level intrinsic structural features inside each patch. To alleviate these problems, we propose a novel deep medical image segmentation framework called Dual Swin Transformer U-Net (DS-TransUNet), which might be the first attempt to concurrently incorporate the advantages of hierarchical Swin Transformer into both encoder and decoder of the standard U-shaped architecture to enhance the semantic segmentation quality of varying medical images. Unlike many prior Transformer-based solutions, the proposed DS-TransUNet first adopts dual-scale encoder subnetworks based on Swin Transformer to extract the coarse and fine-grained feature representations of different semantic scales. As the core component for our DS-TransUNet, a well-designed Transformer Interactive Fusion (TIF) module is proposed to effectively establish global dependencies between features of different scales through the self-attention mechanism. Furthermore, we also introduce the Swin Transformer block into decoder to further explore the long-range contextual information during the up-sampling process. Extensive experiments across four typical tasks for medical image segmentation demonstrate the effectiveness of DS-TransUNet, and show that our approach significantly outperforms the state-of-the-art methods.

研究动机与目标

通过在 U-Net 的编码器和解码器中整合 Transformer 基于的长距离上下文建模，推动医学图像分割的改进。
提出一个双尺度 Swin Transformer 编码器，以提取粗粒度和细粒度特征表征。
开发 Transformer Interactive Fusion (TIF) 模块以全局融合多尺度特征。
在解码器中将 Swin Transformer 块集成，以在上采样过程中增强长距离依赖。
在四个医学分割任务和数据集上展示鲁棒性。

提出的方法

采用在大尺度和小尺度补丁上运行的双分支 Swin Transformer 编码器，以获得粗粒度和细粒度特征。
引入 Transformer Interactive Fusion (TIF) 通过自注意力融合多尺度编码器特征。
在每个解码器阶段整合 Swin Transformer 块，以用全局上下文恢复空间分辨率。
使用多尺度训练和对中间输出设置损失项的深监督以改善收敛。
在 polyp segmentation、ISIC 2018、GLAS 和 2018 Data Science Bowl 数据集上进行训练与评估。

实验结果

研究问题

RQ1双尺度 Swin Transformer 编码器是否能提升医学图像分割中的多尺度特征学习？
RQ2基于 Transformer 的融合模块（TIF）是否能在各尺度间有效整合粗粒度和细粒度特征？
RQ3在解码器中引入 Swin Transformer 块是否能在上采样时增强长距离依赖？
RQ4与最先进方法相比，DS-TransUNet 在多样的医学分割任务中的表现如何？

主要发现

DS-TransUNet 变体在多数据集的息肉分割任务上优于先前的 SOTA 方法。
在 Kvassir 息肉数据集上，DS-TransUNet-L 达到 mDice 0.913、mIoU 0.859、recall 0.936 和 precision 0.916。
在 ClinicDB 上，DS-TransUNet-L 获得 F1 0.9422、mIoU 0.8939、recall 0.9500 和 precision 0.9369。
在息肉分割的未见数据集上，DS-TransUNet 显示出较强的泛化能力，并显著超过竞争方法。
该方法在多项分割任务（息肉、ISIC 2018、GLAS 和 DS Bowl）上持续优于 TransFuse 和其他基线。
定性结果表明在边界勾画方面更清晰且对具有挑战性的息肉更具鲁棒性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。