[论文解读] Enhancing Medical Image Segmentation with TransCeption: A Multi-Scale Feature Fusion Approach
TransCeption 是一种基于纯 Transformer 的 U-Net 变体,使用 ResInception Patch Merging 和 Multi-Branch Transformers(MB Transformer),并结合 Intra-stage Feature Fusion(IFF)和 Dual Transformer Bridge,以实现多尺度特征融合用于医学图像分割,在多器官 CT 与皮肤病变数据集上超越了最先进方法。
While CNN-based methods have been the cornerstone of medical image segmentation due to their promising performance and robustness, they suffer from limitations in capturing long-range dependencies. Transformer-based approaches are currently prevailing since they enlarge the reception field to model global contextual correlation. To further extract rich representations, some extensions of the U-Net employ multi-scale feature extraction and fusion modules and obtain improved performance. Inspired by this idea, we propose TransCeption for medical image segmentation, a pure transformer-based U-shape network featured by incorporating the inception-like module into the encoder and adopting a contextual bridge for better feature fusion. The design proposed in this work is based on three core principles: (1) The patch merging module in the encoder is redesigned with ResInception Patch Merging (RIPM). Multi-branch transformer (MB transformer) adopts the same number of branches as the outputs of RIPM. Combining the two modules enables the model to capture a multi-scale representation within a single stage. (2) We construct an Intra-stage Feature Fusion (IFF) module following the MB transformer to enhance the aggregation of feature maps from all the branches and particularly focus on the interaction between the different channels of all the scales. (3) In contrast to a bridge that only contains token-wise self-attention, we propose a Dual Transformer Bridge that also includes channel-wise self-attention to exploit correlations between scales at different stages from a dual perspective. Extensive experiments on multi-organ and skin lesion segmentation tasks present the superior performance of TransCeption compared to previous work. The code is publicly available at \url{https://github.com/mindflow-institue/TransCeption}.
研究动机与目标
- 在医学图像分割中激发超越标准 CNNs 和单尺度 Transformer 的全局上下文建模改进。
- 提出一种纯 Transformer 的 U-Net 变体(TransCeption),在编码器阶段内外融合多尺度特征。
- 引入架构块(RIPM、MB transformer、IFF)和 Dual Transformer Bridge,以增强跨尺度的特征融合。
提出的方法
- 重新设计编码器 Patch Merging,使用 ResInception Patch Merging (RIPM) 来在一个阶段内捕捉多尺度表示。
- 使用一个 Multi-Branch (MB) Transformer 块处理来自 RIPM 的三个并行特征图(3x3、5x5、7x7)以及一个用于局部细节的额外 3x3 分支。
- 引入 Intra-stage Feature Fusion (IFF),通过以通道为焦点、保持位置信息的注意力来融合多分支输出。
- 开发一个 Dual Transformer Bridge,结合 token-aware 与 channel-aware 注意力,在编码器-解码器桥接处融合跨阶段、跨尺度特征。
- 采用四阶段编码器,具备 overlapped Patch Embedding (OPE) 与 Patch Expanding 的解码器,全部基于 CoaT 风格模块的纯 Transformer 框架。
- 在桥接处使用一个 token-aware Transformer 进行尺度缩减,同时在跨尺度通信中使用 Channel-aware Transformer,以降低复杂性。

实验结果
研究问题
- RQ1多尺度特征在编码器阶段内外的融合是否能提高医学图像分割的性能,超越现有的基于 Transformer 的方法?
- RQ2如何设计 RIPM、MB transformer、IFF 和 Dual Transformer Bridge,以高效地建模跨尺度与跨通道的依赖性?
- RQ3在评估数据集上,使用这些多尺度块的纯 Transformer U-Net 是否能够达到最先进的多器官 CT 分割和皮肤病变分割结果?
- RQ4阶段内融合和阶段间融合对边界精度和对噪声的鲁棒性有何影响?
主要发现
| Method | DSC ↑ | 主动脉 | 胆囊 | 左肾 | 右肾 | 肝脏 | 胰腺 | 脾 | 胃 | HD ↓ |
|---|---|---|---|---|---|---|---|---|---|---|
| V-Net | 68.81 | 75.34 | 51.87 | 77.10 | 80.75 | 87.84 | 40.05 | 80.56 | 56.98 | - |
| DARR | 69.77 | 74.74 | 53.77 | 72.31 | 73.24 | 94.08 | 54.18 | 89.90 | 45.96 | - |
| R50 U-Net | 74.68 | 87.47 | 66.36 | 80.60 | 78.19 | 93.74 | 56.90 | 85.87 | 74.16 | 36.87 |
| U-Net | 76.85 | 89.07 | 69.72 | 77.77 | 68.60 | 93.43 | 53.98 | 86.67 | 75.58 | 39.70 |
| R50 Att-UNet | 75.57 | 55.92 | 63.91 | 79.20 | 72.71 | 93.56 | 49.37 | 87.19 | 74.95 | 36.97 |
| Att-UNet | 77.77 | 89.55 | 68.88 | 77.98 | 71.11 | 93.57 | 58.04 | 87.30 | 75.75 | 36.02 |
| R50 ViT | 71.29 | 73.73 | 55.13 | 75.80 | 72.20 | 91.51 | 45.99 | 81.99 | 73.95 | 32.87 |
| TransUNet | 77.48 | 87.23 | 63.13 | 81.87 | 77.02 | 94.08 | 55.86 | 85.08 | 75.62 | 31.69 |
| TransNorm | 78.40 | 86.23 | 65.10 | 82.18 | 78.63 | 94.22 | 55.34 | 89.50 | 76.01 | 30.25 |
| Swin-Unet | 79.13 | 85.47 | 66.53 | 83.28 | 79.61 | 94.29 | 56.58 | 90.66 | 76.60 | 21.55 |
| TransDeepLab | 80.16 | 86.04 | 69.16 | 84.08 | 79.88 | 93.53 | 61.19 | 89.00 | 78.40 | 21.25 |
| HiFormer | 80.39 | 86.21 | 65.69 | 85.23 | 79.77 | 94.61 | 59.52 | 90.99 | 81.08 | 14.70 |
| MISSFormer | 81.96 | 86.99 | 68.65 | 85.21 | 82.00 | 94.41 | 65.67 | 91.92 | 80.81 | 18.20 |
| TransCeption | 82.24 | 87.60 | 71.82 | 86.23 | 80.29 | 95.01 | 65.27 | 91.68 | 80.02 | 20.89 |
- TransCeption 在 Synapse 多器官分割上实现 DSC 82.24% 与 HD 20.89%,超越了包括纯 Transformer 在内的先前方法。
- 在 ISIC 2018 皮肤病变分割上,TransCeption 实现 DSC 0.9124、ACC 0.9628、SE 0.9192、SP 0.9744,优于若干 CNN 与 Transformer 基线。
- 多尺度 RIPM 与 MB transformer 设计结合 IFF,可在单阶段内实现跨尺度与跨通道特征的更好聚合。
- Dual Transformer Bridge 有效地使用 token-aware 与 channel-aware 注意力来融合多阶段、多尺度特征,提升对全局上下文的建模。
- TransCeption 从零开始训练、无预训练时,在评估数据集上超过了若干预训练 Transformer 基线。

更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。