QUICK REVIEW

[论文解读] Self-Supervised Pre-Training of Swin Transformers for 3D Medical Image Analysis

Yucheng Tang, Dong Yang|arXiv (Cornell University)|Nov 29, 2021

Medical Imaging and Analysis被引用 46

一句话总结

论文介绍 Swin UNETR，一种基于3D transformer 的编码器，使用自监督代理任务在 5,050 个 CT 体积上进行预训练，微调后在 BTCV 与 MSD 基准上实现最先进的分割性能。

ABSTRACT

Vision Transformers (ViT)s have shown great performance in self-supervised learning of global and local representations that can be transferred to downstream applications. Inspired by these results, we introduce a novel self-supervised learning framework with tailored proxy tasks for medical image analysis. Specifically, we propose: (i) a new 3D transformer-based model, dubbed Swin UNEt TRansformers (Swin UNETR), with a hierarchical encoder for self-supervised pre-training; (ii) tailored proxy tasks for learning the underlying pattern of human anatomy. We demonstrate successful pre-training of the proposed model on 5,050 publicly available computed tomography (CT) images from various body organs. The effectiveness of our approach is validated by fine-tuning the pre-trained models on the Beyond the Cranial Vault (BTCV) Segmentation Challenge with 13 abdominal organs and segmentation tasks from the Medical Segmentation Decathlon (MSD) dataset. Our model is currently the state-of-the-art (i.e. ranked 1st) on the public test leaderboards of both MSD and BTCV datasets. Code: https://monai.io/research/swin-unetr

研究动机与目标

开发一个适用于医学图像分析的基于3D transformer 的编码器（Swin UNETR）。
设计并整合面向解剖上下文的自监督代理任务（掩码体积修复、旋转、对比学习）。
证明在大型无标注 CT 语料库上的有效预训练并迁移到分割任务。
在公开的 BTCV 与 MSD 基准上验证性能并与先前的最优结果进行比较。

提出的方法

提出带有分层的3D Swin Transformer 编码器的 Swin UNETR，以及带跳跃连接的基于 CNN 的解码器。
用三种自监督代理任务对编码器进行预训练：掩码体积修复、3D 旋转预测，以及对比学习。
使用多目标损失 L_tot = λ1 L_inpaint + λ2 L_contrast + λ3 L_rot，等权重（λ1 = λ2 = λ3 = 1）。
在预训练阶段进行随机子体积裁剪和增强，以学习对头颈、胸部和腹盆区域的 ROI 关注表示。
在 BTCV 多器官分割和 MSD 任务上微调预训练的编码器，采用带 CNN 解码器和跳跃连接的4阶段 Swin Transformer 编码器。

实验结果

研究问题

RQ1一个用自监督任务预训练的3D Swin Transformer 编码器是否能为 CT 数据学习稳健的、ROI 感知的表示？
RQ2掩码体积修复、旋转预测和对比编码是否协同提升下游的3D 医学图像分割性能？
RQ3在大规模无标注 CT 数据集上的预训练如何影响在 BTCV 和 MSD 基准上的性能、数据效率和收敛性？

主要发现

Swin UNETR 通过自监督预训练在 BTCV 多器官分割上达到最先进的 Dice 分数。
在 MSD 上，Swin UNETR 在若干任务中达到最高性能，并在十项任务中获得最佳整体 Dice。
消融分析显示将所有代理任务结合可获得最佳 Dice（在他们的研究中 BTCV 的 Dice 为 84.72%），其中修复任务在单任务中提供强力提升。
预训练降低标注工作量，在标注数据较少时也能取得更高的性能（例如 BTCV 标签的 10% 即可实现约 10% 的 Dice 提升）。
增加预训练数据量并使用所有代理任务可加速收敛并提升下游精度。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。