QUICK REVIEW

[논문 리뷰] UNETR++: Delving into Efficient and Accurate 3D Medical Image Segmentation

Abdelrahman Shaker, Muhammad Maaz|arXiv (Cornell University)|2022. 12. 08.

Advanced Neural Network Applications인용 수 52

한 줄 요약

UNETR++는 계층적 3D 분할 네트워크에서 공간 특성과 채널 특성을 공동으로 모델링하기 위해 Efficient Paired-Attention (EPA) 블록을 도입하여 매개변수와 FLOPs가 크게 줄인 채로 최첨단 정확도를 달성합니다.

ABSTRACT

Owing to the success of transformer models, recent works study their applicability in 3D medical segmentation tasks. Within the transformer models, the self-attention mechanism is one of the main building blocks that strives to capture long-range dependencies. However, the self-attention operation has quadratic complexity which proves to be a computational bottleneck, especially in volumetric medical imaging, where the inputs are 3D with numerous slices. In this paper, we propose a 3D medical image segmentation approach, named UNETR++, that offers both high-quality segmentation masks as well as efficiency in terms of parameters, compute cost, and inference speed. The core of our design is the introduction of a novel efficient paired attention (EPA) block that efficiently learns spatial and channel-wise discriminative features using a pair of inter-dependent branches based on spatial and channel attention. Our spatial attention formulation is efficient having linear complexity with respect to the input sequence length. To enable communication between spatial and channel-focused branches, we share the weights of query and key mapping functions that provide a complimentary benefit (paired attention), while also reducing the overall network parameters. Our extensive evaluations on five benchmarks, Synapse, BTCV, ACDC, BRaTs, and Decathlon-Lung, reveal the effectiveness of our contributions in terms of both efficiency and accuracy. On Synapse, our UNETR++ sets a new state-of-the-art with a Dice Score of 87.2%, while being significantly efficient with a reduction of over 71% in terms of both parameters and FLOPs, compared to the best method in the literature. Code: https://github.com/Amshaker/unetr_plus_plus.

연구 동기 및 목표

3D 의학 영상에서 세그먼트 정확도와 모델 효율성 간의 트레이드오프를 동기 부여하고 해결한다.
매개변수 및 계산 효율이 높은 UNETR을 기초로 한 통합 하이브리드 아키텍처를 제안한다.
향상된 공간 및 채널 의존성을 포착하기 위한 Efficient Paired-Attention (EPA) 블록을 도입한다.
정확도 향상 및 효율성 개선을 입증하기 위해 여러 벤치마크에서 UNETR++를 평가한다.

제안 방법

네 개의 인코더/디코더 스테이지를 갖춘 UNETR를 기반으로 한 계층적 인코더–디코더 아키텍처를 도입한다.
두 개의 병렬 주의 모듈(공간 및 채널)을 공유하는 Q/K 가중치를 갖되 V 경로는 분리된 Efficient Paired-Attention (EPA) 블록을 개발한다.
공간 주의가 입력 토큰에 대해 선형 복잡도로 작동하도록 저차원 공간에서 작동하게 한다.
공간 및 채널 분기 간 Q/K 가중치를 공유하여 매개변수를 줄이고 보완적 특징 학습을 가능하게 한다.
최종 보셀 단위 예측 전에 EPA 출력을 1x1x1 및 3x3x3 합성곱으로 융합한다.
세그먼트 품질을 최적화하기 위해 소프트 Dice와 교차 엔트로피 손실의 결합으로 학습한다.

실험 결과

연구 질문

RQ1Efficient Paired-Attention (EPA) 블록이 계산 복잡도를 줄이면서 세그먼트 정확도를 유지하거나 향상시킬 수 있는가?
RQ2인코더와 디코더 모두에 EPA를 갖춘 계층적 UNETR++ 아키텍처가 다양한 벤치마크에서 최첨단 3D 의학 분할 방법을 능가하는가?
RQ3여러 데이터세트(Synapse, BTCV, ACDC, BRaTs, Decathlon-Lung)에서 세그먼트 정확도(DSC)와 효율성(매개변수, FLOPs) 측면에서 UNETR++의 성능은 어떠한가?

주요 결과

Synapse에서 UNETR++는 baseline UNETR에 비해 매개변수(42.96M)와 FLOPs(47.98G)가 크게 감소한 상태에서 Dice Score가 87.22%를 달성합니다.
인코더에 EPA를 단독으로 통합하면 85.17% DSC를 얻고, 디코더에 EPA를 추가하면 baseline 대비 약 54% 더 적은 매개변수와 약 37% 더 적은 FLOPs로 87.22% DSC까지 향상됩니다.
UNETR++는 Synapse에서 nnFormer보다 뛰어나며 매개변수와 FLOPs를 70% 이상 적게 사용함으로써 우수한 정확도-효율성 균형을 보여줍니다.
BTCV에서 UNETR++는 평균 DSC 83.28%와 31.0 GFLOPs를 달성하여 nnUNet의 평균 DSC 83.16%(그러나 358 GFLOPs)와 비교하여 우호적으로 보입니다.
ACDC 결과는 UNETR++가 평균 DSC 82.83%(nnFormer 92.06%, UNETR 86.61% 대비)로 강한 성능과 더 높은 효율성을 보여줍니다.
BRaTs와 Lungs 데이터셋은 최근 Transformer 기반 방법과 비교하여 우호적인 세그먼트 성능과 효율성 트레이드오프를 보여줍니다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.