[论文解读] Swin UNETR: Swin Transformers for Semantic Segmentation of Brain Tumors in MRI Images
Swin UNETR 提出了一种 U 形的 3D 分割模型,使用 Swin Transformer 编码器配合 CNN 解码器进行多模态 MRI 的脑肿瘤体积分割,在 BraTS 2021 验证中达到顶尖性能。
Semantic segmentation of brain tumors is a fundamental medical image analysis task involving multiple MRI imaging modalities that can assist clinicians in diagnosing the patient and successively studying the progression of the malignant entity. In recent years, Fully Convolutional Neural Networks (FCNNs) approaches have become the de facto standard for 3D medical image segmentation. The popular "U-shaped" network architecture has achieved state-of-the-art performance benchmarks on different 2D and 3D semantic segmentation tasks and across various imaging modalities. However, due to the limited kernel size of convolution layers in FCNNs, their performance of modeling long-range information is sub-optimal, and this can lead to deficiencies in the segmentation of tumors with variable sizes. On the other hand, transformer models have demonstrated excellent capabilities in capturing such long-range information in multiple domains, including natural language processing and computer vision. Inspired by the success of vision transformers and their variants, we propose a novel segmentation model termed Swin UNEt TRansformers (Swin UNETR). Specifically, the task of 3D brain tumor semantic segmentation is reformulated as a sequence to sequence prediction problem wherein multi-modal input data is projected into a 1D sequence of embedding and used as an input to a hierarchical Swin transformer as the encoder. The swin transformer encoder extracts features at five different resolutions by utilizing shifted windows for computing self-attention and is connected to an FCNN-based decoder at each resolution via skip connections. We have participated in BraTS 2021 segmentation challenge, and our proposed model ranks among the top-performing approaches in the validation phase. Code: https://monai.io/research/swin-unetr
研究动机与目标
- 通过在 3D MRI 中捕捉远距离依赖和多尺度上下文,解决准确的多模态脑肿瘤分割挑战。
- 利用分层的 Swin Transformer 编码来提升分割效果,相较传统 CNN 基的 FCNNs。
- 在分辨率之间通过跳连接整合 CNN 基解码器以保留细粒度的空间细节。
- 在 BraTS 2021 基准上展示最先进或具有竞争力的性能。
提出的方法
- 将 3D 脑肿瘤分割表示为序列到序列问题,Swin Transformer 编码器处理多模态 MRI 块。
- 使用带有移动窗口的分层 Swin Transformer,在四个阶段累积多尺度特征。
- 通过在 U 形架构中在多个分辨率处的跳连接将编码器特征连接到基于 CNN 的解码器。
- 使用 soft Dice 损失和 BraTS 标准预处理进行训练,包括基于块的训练和数据增强。
- 使用五折交叉验证进行评估,并对 10 个 Swin UNETR 模型进行集成以得到最终 BraTS 2021 结果。
实验结果
研究问题
- RQ1基于 Swin Transformer 的编码器结合 CNN 解码器能否在 BraTS 2021 中相对于完全卷积基线显著提升 3D 多模态脑肿瘤分割性能?
- RQ2分层、带平移窗口的自注意力机制是否能有效捕获用于不同肿瘤形态的多尺度上下文?
- RQ3多分辨率跳跃连接对 WT、TC、ET 区域的分割精度有何影响?
主要发现
- Swin UNETR 在 ET、WT、TC 区域的平均 Dice 分数上,跨折次比若干竞争的基于 CNN 的模型更高。
- 带平移窗口的分层 Swin Transformer 编码器相较于 ViT 为基础的方法,在建模远程依赖和多尺度上下文方面有所提升。
- 对来自交叉验证的 10 个模型进行集成进一步提升在 BraTS 2021 验证集上的表现。
- 在 BraTS 2021 测试数据上,ET 和 WT 的表现接近验证基准,TC 区域略有下降。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。