QUICK REVIEW

[논문 리뷰] TransBTSV2: Towards Better and More Efficient Volumetric Segmentation of Medical Images

Jiangyun Li, Wenxuan Wang|arXiv (Cornell University)|2022. 01. 30.

Radiomics and Machine Learning in Medical Imaging인용 수 33

한 줄 요약

TransBTSV2는 이전 모델보다 성능이 우수한 더 넓고 얕은(shallower) Transformer 설계와 Deformable Bottleneck Module를 갖춘, 효율적인 3D 의료 영상 부피 분할을 위한 하이브리드 CNN-Transformer 프레임워크입니다.

ABSTRACT

Transformer, benefiting from global (long-range) information modeling using self-attention mechanism, has been successful in natural language processing and computer vision recently. Convolutional Neural Networks, capable of capturing local features, are difficult to model explicit long-distance dependencies from global feature space. However, both local and global features are crucial for dense prediction tasks, especially for 3D medical image segmentation. In this paper, we present the further attempt to exploit Transformer in 3D CNN for 3D medical image volumetric segmentation and propose a novel network named TransBTSV2 based on the encoder-decoder structure. Different from TransBTS, the proposed TransBTSV2 is not limited to brain tumor segmentation (BTS) but focuses on general medical image segmentation, providing a stronger and more efficient 3D baseline for volumetric segmentation of medical images. As a hybrid CNN-Transformer architecture, TransBTSV2 can achieve accurate segmentation of medical images without any pre-training, possessing the strong inductive bias as CNNs and powerful global context modeling ability as Transformer. With the proposed insight to redesign the internal structure of Transformer block and the introduced Deformable Bottleneck Module to capture shape-aware local details, a highly efficient architecture is achieved with superior performance. Extensive experimental results on four medical image datasets (BraTS 2019, BraTS 2020, LiTS 2017 and KiTS 2019) demonstrate that TransBTSV2 achieves comparable or better results compared to the state-of-the-art methods for the segmentation of brain tumor, liver tumor as well as kidney tumor. Code will be publicly available at https://github.com/Wenxuan-1119/TransBTS.

연구 동기 및 목표

글로벌 컨텍스트를 활용하면서 로컬 3D 디테일을 보존하여 부피 기반 분할을 개선하려는 동기를 제시합니다.
성능을 희생하지 않으면서 모델 복잡성을 줄이기 위해 더 넓은 Transformer 설계를 제안합니다.
도입합니다 Deformable Bottleneck Modules를 도입합니다 to capture irregular, shape-aware lesion details.
뇌종양을 넘어 적용 가능한 일반적이고 사전 학습이 필요 없는 3D CNN-Transformer 프레임워크를 제공합니다.
여러 의학 분할 벤치마크에서 경쟁력 있거나 우수한 성능을 보여줍니다.

제안 방법

다운샘플링으로 로컬 부피 특징을 추출하기 위해 수정된 3D CNN 인코더를 사용합니다.
채널 차원을 확장하고 공간/깊이 차원을 토큰으로 평탄화하여 3D 특징을 Transformer 인코더에 임베딩합니다.
융통성 있게 확장된 다중 헤드 자기 주의(FW-MHSA)와 FFN을 갖춘 재설계된 Transformer 블록을 적용하여 얕지만 넓은 아키텍처를 형성합니다.
inverted bottleneck-like width expansion: Transformer 깊이를 한 블록으로 줄이고 내부를 넓혀 파라미터 수와 FLOPs를 감소시킵니다.
3D deformable convolution을 통해 형태 인식 디테일을 얻기 위해 각 skip-connection에 Deformable Bottleneck Module을 통합합니다.
Transformer 출력을 4D 피처 맵으로 복원하고, 고해상도 분할을 위해 skip-connection이 있는 3D CNN 디코더를 사용합니다.

실험 결과

연구 질문

RQ1사전 학습 없이 Transformer 기반 모델을 3D CNN에 효과적이고 효율적으로 통합하여 부피형 의학 영상 분할을 수행할 수 있는가?
RQ2더 넓고 얕은 Transformer 아키텍처가 모델 복잡성을 줄이면서 분할 성능을 유지하거나 향상시킬 수 있는가?
RQ3skip-connection의 변형 메커니즘이 3D 의학 영상의 불규칙한 병변 형태 처리 향상에 기여할 수 있는가?
RQ4TransBTSV2가 뇌, 간, 신장 종양 분할 벤치마크에서 최신 방법과 비교해 어떤 성능을 보이는가?

주요 결과

TransBTSV2는 BraTS 2019/2020, LiTS 2017, 및 KiTS 2019 데이터셋에서 경쟁력 있거나 우수한 성능을 달성합니다.
Transformer 깊이를 1로 감소시키고 내부 차원을 확장하면 파라미터 수와 FLOPs가 크게 감소합니다(예: 파라미터 53.62% 감소, FLOPs 27.75% 감소) 성능은 유지 또는 향상.
Deformable Bottleneck Module은 skip-connection에서 추가 계산 비용이 거의 없으면서 모양 인식 로컬 디테일 포착을 가능하게 합니다.
이 아키텍처는 사전 학습에 의존하지 않는 깔끔하고 일반적인 3D 네트워크로 남아 있으며 다중 스케일 특징 융합과 같은 추가 기법을 통합할 수 있습니다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.