QUICK REVIEW

[论文解读] 3D UX-Net: A Large Kernel Volumetric ConvNet Modernizing Hierarchical Transformer for Medical Image Segmentation

Ho Hin Lee, Shunxing Bao|arXiv (Cornell University)|Sep 29, 2022

Medical Imaging and Analysis被引用 99

一句话总结

本文提出 3D UX-Net，一种使用大核深度卷积的纯卷积网络，仿真分层 Transformer 行为用于体积医疗影像分割，在公开数据集上以更少参数达到最先进的结果。

ABSTRACT

The recent 3D medical ViTs (e.g., SwinUNETR) achieve the state-of-the-art performances on several 3D volumetric data benchmarks, including 3D medical image segmentation. Hierarchical transformers (e.g., Swin Transformers) reintroduced several ConvNet priors and further enhanced the practical viability of adapting volumetric segmentation in 3D medical datasets. The effectiveness of hybrid approaches is largely credited to the large receptive field for non-local self-attention and the large number of model parameters. In this work, we propose a lightweight volumetric ConvNet, termed 3D UX-Net, which adapts the hierarchical transformer using ConvNet modules for robust volumetric segmentation. Specifically, we revisit volumetric depth-wise convolutions with large kernel size (e.g. starting from $7 imes7 imes7$) to enable the larger global receptive fields, inspired by Swin Transformer. We further substitute the multi-layer perceptron (MLP) in Swin Transformer blocks with pointwise depth convolutions and enhance model performances with fewer normalization and activation layers, thus reducing the number of model parameters. 3D UX-Net competes favorably with current SOTA transformers (e.g. SwinUNETR) using three challenging public datasets on volumetric brain and abdominal imaging: 1) MICCAI Challenge 2021 FLARE, 2) MICCAI Challenge 2021 FeTA, and 3) MICCAI Challenge 2022 AMOS. 3D UX-Net consistently outperforms SwinUNETR with improvement from 0.929 to 0.938 Dice (FLARE2021) and 0.867 to 0.874 Dice (Feta2021). We further evaluate the transfer learning capability of 3D UX-Net with AMOS2022 and demonstrates another improvement of $2.27\%$ Dice (from 0.880 to 0.900). The source code with our proposed model are available at https://github.com/MASILab/3DUX-Net.

研究动机与目标

激发对在性能与模型规模之间取得平衡的高效 3D 分割骨干网络的需求。
提出一种使用大核深度卷积模仿分层 Transformer 行为的轻量级体积卷积网络。
在保持或提高分割精度的同时，减少参数和归一化需求。
展示在公开大脑和腹部数据集上的监督学习与迁移学习设置中的强劲实证结果。

提出的方法

引入体积深度卷积，使用大核尺寸（LK）以模拟大感受野。
将 transformer 块中的 MLP 替换为逐点深度卷积，以较少参数扩展特征（DCS）。
使用带深度卷积的倒瓶颈设计，在各层之间扩展并压缩通道特征。
在编码器块中用层归一化替换批归一化，并使用 GELU 激活。
构建一个四阶段编码器，每阶段两个 LK 块，随后是带跳过连接的基于 ConvNet 的 U 形解码器。

实验结果

研究问题

RQ1纯卷积网络配合大核深度卷积是否能达到或超过基于 transformer 的 3D 分割性能？
RQ2所提的 3D UX-Net 是否能在参数更少、归一化层更少的情况下达到可比或更好精度？
RQ33D UX-Net 在公开体积数据集上的监督训练和迁移学习表现如何？
RQ4核大小与 depthwise scaling（DCS）对跨数据集的分割性能有何影响？

主要发现

3D UX-Net 在 FeTA2021 上超越 SwinUNETR（0.874 对 0.867 Dice），在 FLARE2021 上超越（0.934 对 0.929 Dice）。
在 AMOS2022 的迁移学习中，3D UX-Net 达到 Dice 0.900，比最佳 transformer 基线提升 2.27%。
消融研究表明核尺寸与 depthwise scaling 会影响性能，LK 尺寸大约 7x7x7 到 13x13x13 在各数据集提供最佳增益。
3D UX-Net 在 FeTA2021 收敛更快，在 AMOS2022 展现出稳健的迁移学习行为。
与若干 transformer 基线相比，该模型在参数更少（53.0M）的情况下实现有竞争力的 Dice 分数。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。