[论文解读] TransNetR: Transformer-based Residual Network for Polyp Segmentation with Multi-Center Out-of-Distribution Testing
TransNetR 是一种编码器-解码器的息肉分割模型,结合了预训练的 ResNet50 编码器与 Residual Transformer 块,以实现实时性能并在跨多中心数据集的 OOD 数据上具有很强的泛化能力。
Colonoscopy is considered the most effective screening test to detect colorectal cancer (CRC) and its precursor lesions, i.e., polyps. However, the procedure experiences high miss rates due to polyp heterogeneity and inter-observer dependency. Hence, several deep learning powered systems have been proposed considering the criticality of polyp detection and segmentation in clinical practices. Despite achieving improved outcomes, the existing automated approaches are inefficient in attaining real-time processing speed. Moreover, they suffer from a significant performance drop when evaluated on inter-patient data, especially those collected from different centers. Therefore, we intend to develop a novel real-time deep learning based architecture, Transformer based Residual network (TransNetR), for colon polyp segmentation and evaluate its diagnostic performance. The proposed architecture, TransNetR, is an encoder-decoder network that consists of a pre-trained ResNet50 as the encoder, three decoder blocks, and an upsampling layer at the end of the network. TransNetR obtains a high dice coefficient of 0.8706 and a mean Intersection over union of 0.8016 and retains a real-time processing speed of 54.60 on the Kvasir-SEG dataset. Apart from this, the major contribution of the work lies in exploring the generalizability of the TransNetR by testing the proposed algorithm on the out-of-distribution (test distribution is unknown and different from training distribution) dataset. As a use case, we tested our proposed algorithm on the PolypGen (6 unique centers) dataset and two other popular polyp segmentation benchmarking datasets. We obtained state-of-the-art performance on all three datasets during out-of-distribution testing. The source code of TransNetR will be made publicly available at https://github.com/DebeshJha.
研究动机与目标
- 推动适用于临床使用的实时、精准息肉分割。
- 解决在未知中心或分布数据上测试时的泛化差距。
- 提出一种增强的变换器-残差架构,既保持速度又提高对分布漂移的鲁棒性。
提出的方法
- 使用预训练的 ResNet50 作为编码器的编码器-解码器架构。
- 从编码器获取的四个中间特征图通过 1x1 卷积压缩后送入具有跳跃连接的三块解码器。
- Residual Transformer (RT) 块将卷积特征与基于 transformer 的自注意力融合。
- 最终解码阶段使用残差块替代 RT 以减少参数量,随后进行上采样并通过 sigmoid 1x1 卷积进行分割。

实验结果
研究问题
- RQ1TransNetR 在同分布的息肉分割基准上相较于最先进方法的表现如何?
- RQ2TransNetR 是否能对来自多个中心/数据集的 OOD 数据进行泛化测试?
- RQ3Residual Transformer 块对分割精度和模型效率的影响是什么?
- RQ4模型在跨多样化数据集时是否能维持实时推断速度并实现高质量分割?
主要发现
| 方法 | mIoU | mDSC | Rec. | Prec. | F2 | FPS | 参数量 | FLOPs |
|---|---|---|---|---|---|---|---|---|
| U-Net | 0.7472 | 0.8264 | 0.8504 | 0.8703 | 0.8353 | 106.88 | 31.04 | 54.75 |
| U-Net++ | 0.7420 | 0.8228 | 0.8437 | 0.8607 | 0.8295 | 81.34 | 9.16 | 34.65 |
| ResU-Net++ | 0.5341 | 0.6453 | 0.6964 | 0.7080 | 0.6576 | 43.11 | 4.06 | 15.81 |
| HarDNet-MSEG | 0.7459 | 0.8260 | 0.8485 | 0.8652 | 0.8358 | 34.80 | 33.34 | 6.02 |
| ColonSegNet | 0.6980 | 0.7920 | 0.8193 | 0.8432 | 0.7999 | 73.95 | 5.01 | 62.16 |
| UACANet | 0.7692 | 0.8502 | 0.8799 | 0.8706 | 0.8626 | 25.85 | 69.16 | 31.51 |
| UNeXt | 0.6284 | 0.7318 | 0.7840 | 0.7656 | 0.7507 | 87.47 | 1.47 | 0.57 |
| TransNetR (Ours) | 0.8016 | 0.8706 | 0.8843 | 0.9073 | 0.8744 | 54.60 | 27.27 | 10.58 |
- TransNetR 在 Kvasir-SEG 测试集上达到 Dice 系数 0.8706、mIoU 0.8016,召回率 0.8843、精确度 0.9073,帧率为 54.60 FPS。
- 在 OOD 测试中,TransNetR 在 PolypGen(6 个中心)、BKAI-IGH 和 CVC-ClinicDB 数据集上均达到最先进的性能。
- 消融实验表明,Residual Transformer (RT) 块在没有 RT 的变体上提升了指标(如 +1.34% mIoU),且仍保持实时速度。
- 在多次 OOD 评估中,TransNetR 在中心级和数据集级分析的 mIoU 与 DSC 指标上持续优于竞争对手(包括 UACANet 与 UNeXt)。
- 中心级结果表明在来自不同中心的数据上表现稳健,包括小型与多发息肉,且边界描绘准确。

更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。