QUICK REVIEW

[논문 리뷰] DS-TransUNet:Dual Swin Transformer U-Net for Medical Image Segmentation

Ailiang Lin, Bingzhi Chen|arXiv (Cornell University)|2021. 06. 12.

Advanced Neural Network Applications인용 수 41

한 줄 요약

DS-TransUNet은 dual-scale Swin Transformer 인코더와 U자형 아키텍처 내 Transformer Interactive Fusion 모듈을 도입하여 의료 영상 분할에서 장거리 의존성 및 다중 스케일 컨텍스트를 포착하고, polyp segmentation, ISIC 2018, GLAS, 2018 DS Bowl를 포함한 다수의 데이터셋에서 최첨단 성능을 달성합니다.

ABSTRACT

Automatic medical image segmentation has made great progress benefit from the development of deep learning. However, most existing methods are based on convolutional neural networks (CNNs), which fail to build long-range dependencies and global context connections due to the limitation of receptive field in convolution operation. Inspired by the success of Transformer in modeling the long-range contextual information, some researchers have expended considerable efforts in designing the robust variants of Transformer-based U-Net. Moreover, the patch division used in vision transformers usually ignores the pixel-level intrinsic structural features inside each patch. To alleviate these problems, we propose a novel deep medical image segmentation framework called Dual Swin Transformer U-Net (DS-TransUNet), which might be the first attempt to concurrently incorporate the advantages of hierarchical Swin Transformer into both encoder and decoder of the standard U-shaped architecture to enhance the semantic segmentation quality of varying medical images. Unlike many prior Transformer-based solutions, the proposed DS-TransUNet first adopts dual-scale encoder subnetworks based on Swin Transformer to extract the coarse and fine-grained feature representations of different semantic scales. As the core component for our DS-TransUNet, a well-designed Transformer Interactive Fusion (TIF) module is proposed to effectively establish global dependencies between features of different scales through the self-attention mechanism. Furthermore, we also introduce the Swin Transformer block into decoder to further explore the long-range contextual information during the up-sampling process. Extensive experiments across four typical tasks for medical image segmentation demonstrate the effectiveness of DS-TransUNet, and show that our approach significantly outperforms the state-of-the-art methods.

연구 동기 및 목표

Transformer 기반 장거리 컨텍스트 모델링을 U-Net의 인코더와 디코더 모두에 통합하여 의료 이미지 분할의 개선을 추진한다.
coarse와 fine 특징 표현을 추출하기 위해 dual-scale Swin Transformer 인코더를 제안한다.
다중 스케일 특징을 전역적으로 융합하기 위한 Transformer Interactive Fusion (TIF) 모듈을 개발한다.
디코더에 Swin Transformer 블록을 도입하여 긴 거리 의존성이 있는 업샘플링을 향상한다.
네 가지 의료 분할 태스크 및 데이터셋에서 강인성을 시연한다.

제안 방법

대형 패치와 소형 패치에서 작동하는 이중 분기 Swin Transformer 인코더를 사용하여 거친(coarse) 및 미세(fine) 특징을 얻는다.
다중 스케일 인코더 특징을 셀프 어텐션으로 융합하는 Transformer Interactive Fusion (TIF)을 도입한다.
각 디코더 스테이지에 Swin Transformer 블록을 통합하여 전역 컨텍스트와 함께 공간 해상도를 복구한다.
다중 스케일 학습 및 중간 출력에 손실 항을 적용한 심층 감독을 통해 수렴을 개선한다.
polyp segmentation, ISIC 2018, GLAS, 및 2018 Data Science Bowl 데이터셋에서 학습하고 평가한다.

실험 결과

연구 질문

RQ1듀얼 스케일 Swin Transformer 인코더가 의료 영상 분할을 위한 다중 스케일 특징 학습을 향상시킬 수 있는가?
RQ2Transformer 기반 융합 모듈(TIF)이 스케일 간 coarse 및 fine 특징을 효과적으로 통합하는가?
RQ3디코더에 Swin Transformer 블록을 도입하면 전역 컨텍스트와 함께 업샘플링이 향상되는가?
RQ4DS-TransUNet은 최첨단 방법과 비교하여 다양한 의료 분할 태스크에서 얼마나 잘 수행하는가?

주요 결과

DS-TransUNet 변형은 다수의 데이터셋에서 polyp segmentation에 대해 이전 SOTA 방법들을 능가한다.
Kvassir polyp 데이터셋에서 DS-TransUNet-L은 mDice 0.913, mIoU 0.859, recall 0.936, 및 precision 0.916을 달성한다.
ClinicDB에서 DS-TransUNet-L은 F1 0.9422, mIoU 0.8939, recall 0.9500, 및 precision 0.9369를 달성한다.
未見 데이터셋들에서 polyp segmentation에 걸쳐 DS-TransUNet은 강력한 일반화 능력을 보이며 경쟁 방법들을 상당한 차이로 능가한다.
다중 세그멘테이션 태스크(polyp, ISIC 2018, GLAS, 및 DS Bowl) 전반에 걸쳐 TransFuse 및 기타 베이스라인 대비 일관된 개선을 보인다.
정성적 결과는 경계선의 더 나은 구분과 어려운 폴립에 대한 강건성을 시사한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.