QUICK REVIEW

[논문 리뷰] Swin UNETR: Swin Transformers for Semantic Segmentation of Brain Tumors in MRI Images

Ali Hatamizadeh, Vishwesh Nath|arXiv (Cornell University)|2022. 01. 04.

Advanced Neural Network Applications인용 수 30

한 줄 요약

Swin UNETR은 다 modality MRI에서 뇌종양 분할을 위한 다차원 U자형 3D 분할 모델로, Swin Transformer 인코더와 CNN 디코더를 사용하여 다중 모달 MRI에서 BraTS 2021 검증에서 최상위 성능을 달성합니다.

ABSTRACT

Semantic segmentation of brain tumors is a fundamental medical image analysis task involving multiple MRI imaging modalities that can assist clinicians in diagnosing the patient and successively studying the progression of the malignant entity. In recent years, Fully Convolutional Neural Networks (FCNNs) approaches have become the de facto standard for 3D medical image segmentation. The popular "U-shaped" network architecture has achieved state-of-the-art performance benchmarks on different 2D and 3D semantic segmentation tasks and across various imaging modalities. However, due to the limited kernel size of convolution layers in FCNNs, their performance of modeling long-range information is sub-optimal, and this can lead to deficiencies in the segmentation of tumors with variable sizes. On the other hand, transformer models have demonstrated excellent capabilities in capturing such long-range information in multiple domains, including natural language processing and computer vision. Inspired by the success of vision transformers and their variants, we propose a novel segmentation model termed Swin UNEt TRansformers (Swin UNETR). Specifically, the task of 3D brain tumor semantic segmentation is reformulated as a sequence to sequence prediction problem wherein multi-modal input data is projected into a 1D sequence of embedding and used as an input to a hierarchical Swin transformer as the encoder. The swin transformer encoder extracts features at five different resolutions by utilizing shifted windows for computing self-attention and is connected to an FCNN-based decoder at each resolution via skip connections. We have participated in BraTS 2021 segmentation challenge, and our proposed model ranks among the top-performing approaches in the validation phase. Code: https://monai.io/research/swin-unetr

연구 동기 및 목표

3D MRI에서 다중 모달 뇌종양의 정확한 분할 문제를 해결하기 위해 장거리 의존성 및 다중 스케일 컨텍스트를 포착합니다.
전통적인 CNN 기반 FCNN보다 분할 성능을 향상시키기 위해 계층적 Swin Transformer 인코딩을 활용합니다.
해상도 간의 미세한 공간 정보를 보존하기 위해 skip 연결이 있는 CNN 기반 디코더를 통합합니다.
BraTS 2021 벤치마크에서 최첨단 또는 경쟁력 있는 성능을 시연합니다.

제안 방법

다중 모달 MRI 패치를 처리하는 Swin Transformer 인코더를 사용하여 3D 뇌종양 분할을 시퀀스-투-시퀀스 문제로 공식화합니다.
이동된 윈도우를 갖춘 계층적 Swin Transformer를 사용하여 네 개의 단계에 걸쳐 다중 스케일 특징을 축적합니다.
U-자형 아키텍처에서 여러 해상도에서 스킵 연결을 통해 인코더 특징을 CNN 기반 디코더에 연결합니다.
패치 기반 학습 및 데이터 증강을 포함한 BraTS 표준 전처리와 함께 Soft Dice 손실로 학습합니다.
다섯중 cross-validation으로 평가하고 최종 BraTS 2021 결과를 위해 10개의 Swin UNETR 모델을 앙상블합니다.

실험 결과

연구 질문

RQ1Swin Transformer 기반 인코더와 CNN 디코더가 BraTS 2021에서 Fully Convolutional 기준선보다 3D 다중 모달 뇌종양 분할을 향상시킬 수 있는가?
RQ2계층적이며 이동 창(self-attention) 메커니즘이 다양한 종양 형태에 대해 다중 스케일 컨텍스트를 효과적으로 포착하는가?
RQ3WT, TC, ET 영역의 분할 정확도에 다중 해상도 스킵 연결이 어떤 영향을 미치는가?

주요 결과

Swin UNETR은 ET, WT, TC 영역에 대해 여러 교차검증 폴드에서 경쟁력 있는 CNN 기반 모델들보다 평균 Dice 점수가 더 높게 나타났다.
이동 윈도우를 갖춘 계층적 Swin 트랜스포머 인코더가 ViT 기반 접근법에 비해 장거리 의존성 및 다중 스케일 컨텍스트 모델링을 향상시킨다.
교차 검증에서 10개 모델 앙상블이 BraTS 2021 검증에서 성능을 더욱 향상시킨다.
BraTS 2021 테스트 데이터에서 ET 및 WT 성능은 검증 벤치마크에 근접하고, TC 영역에서 약간의 하락이 있다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.