[论文解读] TransAttUnet: Multi-level Attention-guided U-Net with Transformer for Medical Image Segmentation
TransAttUnet 将基于 Transformer 的自注意力与全局空间注意力与多尺度跳跃连接相结合,使在多模态医学影像分割上表现优于最先进的基线。
Accurate segmentation of organs or lesions from medical images is crucial for reliable diagnosis of diseases and organ morphometry. In recent years, convolutional encoder-decoder solutions have achieved substantial progress in the field of automatic medical image segmentation. Due to the inherent bias in the convolution operations, prior models mainly focus on local visual cues formed by the neighboring pixels, but fail to fully model the long-range contextual dependencies. In this paper, we propose a novel Transformer-based Attention Guided Network called TransAttUnet, in which the multi-level guided attention and multi-scale skip connection are designed to jointly enhance the performance of the semantical segmentation architecture. Inspired by Transformer, the self-aware attention (SAA) module with Transformer Self Attention (TSA) and Global Spatial Attention (GSA) is incorporated into TransAttUnet to effectively learn the non-local interactions among encoder features. Moreover, we also use additional multi-scale skip connections between decoder blocks to aggregate the upsampled features with different semantic scales. In this way, the representation ability of multi-scale context information is strengthened to generate discriminative features. Benefitting from these complementary components, the proposed TransAttUnet can effectively alleviate the loss of fine details caused by the stacking of convolution layers and the consecutive sampling operations, finally improving the segmentation quality of medical images. Extensive experiments on multiple medical image segmentation datasets from different imaging modalities demonstrate that the proposed method consistently outperforms the state-of-the-art baselines. Our code and pre-trained models are available at: https://github.com/YishuLiu/TransAttUnet.
研究动机与目标
- 通过解决卷积编码器的局部偏差来推动医学图像分割的改进。
- 提出基于 Transformer 的自感知注意力(SAA)模块,将 Transformer 自注意力与全局空间注意力结合起来。
- 引入多尺度跳跃连接,以更好地融合解码器的多尺度特征。
- 在多样的医学成像模态下证明方法的有效性。
- 与强基线和对比消融研究进行公平比较。
提出的方法
- 在编码器-解码器桥接处集成自感知注意力(SAA)模块,以将 TSA 和 GSA 与编码器特征融合。
- 使用 Transformer 自注意力(TSA) with 多头注意力和学习到的位置信息编码来建模远程依赖关系。
- 应用全局空间注意力(GSA)通过位置感知的通道交互捕捉全局上下文。
- 通过可学习的加权组合(F_SAA)将 TSA 和 GSA 的输出与编码器特征融合。
- 在解码器块之间实现多尺度跳跃连接,通过残差或密集连接逐步聚合来自不同语义尺度的特征。
- 采用联合 Dice 与 BCE 损失(L = alpha*L_BCE + beta*L_Dice)进行训练,以平衡像素级精度与分割重叠。
实验结果
研究问题
- RQ1将基于 Transformer 的自注意力与全局空间注意力结合,是否能提升基于 U-Net 的医学图像分割?
- RQ2多尺度跳跃连接(残差/密集)是否比传统级联连接更能保留细节?
- RQ3与最先进基线相比,TransAttUnet 在多模态和多数据集上的表现如何?
- RQ4SAA 模块与多尺度融合对分割精度和边界精度的影响如何?
主要发现
| 方法 | 年份 | DICE | IoU | ACC | REC | PRE |
|---|---|---|---|---|---|---|
| U-Net | 2015 | 67.40 | 54.90 | - | 70.80 | - |
| Attention U-Net | 2018 | 66.50 | 56.60 | - | 71.70 | - |
| R2U-Net | 2018 | 67.90 | 58.10 | - | 79.20 | - |
| Att R2UNet | 2018 | 69.10 | 59.20 | - | 72.60 | - |
| ResUNet* | 2019 | 79.15 | 70.15 | 92.28 | 82.43 | 84.77 |
| Channel-UNet* | 2019 | 84.82 | 75.92 | 94.10 | 94.01 | 81.04 |
| BCDU-Net | 2019 | 85.10 | - | - | - | - |
| FANet | 2021 | 87.31 | 80.23 | - | 86.50 | 92.35 |
| PraNet* | 2021 | 87.46 | 80.23 | 95.37 | 91.28 | 87.59 |
| DoubleU-Net | 2020 | 89.62 | 82.12 | - | 87.80 | 94.59 |
| Swin-Unet* | 2021 | 89.72 | 82.90 | - | 90.32 | 92.04 |
| SegFormer* | 2021 | 90.24 | 83.60 | - | 91.12 | 92.10 |
| MCTrans | 2021 | 90.35 | - | - | - | - |
| TransAttUnet_C | - | 89.25 | 81.46 | 95.06 | 89.90 | 91.59 |
| TransAttUnet_D | - | 90.14 | 83.04 | 96.14 | 90.42 | 92.17 |
| TransAttUnet_R | - | 90.74 | 83.80 | 96.38 | 90.93 | 92.42 |
- TransAttUnet 的变体在多个数据集上均优于基线 U-Net 与若干基线模型。
- TransAttUnet_R(密集跳跃连接)在 ISIC-2018 上实现最高 Dice 分数 90.74%。
- Transformer 自注意力(TSA)与全局空间注意力(GSA)联合提升上下文建模效果,优于单独使用任一模块。
- 多尺度跳跃连接(残差或密集)比单次级联连接提供更好的特征聚合,减缓细节丢失。
- 与 MCTrans 在 ISIC-2018 的对比中,TransAttUnet_R 的 Dice 提高(90.74% 对 90.35%)。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。