QUICK REVIEW

[논문 리뷰] LeViT-UNet: Make Faster Encoders with Transformer for Medical Image Segmentation

Guoping Xu, Xingrong Wu|arXiv (Cornell University)|2021. 07. 19.

Advanced Neural Network Applications참고 문헌 39인용 수 54

한 줄 요약

LeViT-UNet은 LeViT 트랜스포머 기반 인코더를 U-넷 유사 디코더에 내장하여 트랜스포머와 CNN 블록의 다중 스케일 피처 융합으로 빠르고 정확한 2D 의료 영상 분할을 달성합니다. Synapse에서 경쟁력 있는 정확도와 경계 예측 개선, ACDC에서 강한 일반화 성능을 보입니다.

ABSTRACT

Medical image segmentation plays an essential role in developing computer-assisted diagnosis and therapy systems, yet still faces many challenges. In the past few years, the popular encoder-decoder architectures based on CNNs (e.g., U-Net) have been successfully applied in the task of medical image segmentation. However, due to the locality of convolution operations, they demonstrate limitations in learning global context and long-range spatial relations. Recently, several researchers try to introduce transformers to both the encoder and decoder components with promising results, but the efficiency requires further improvement due to the high computational complexity of transformers. In this paper, we propose LeViT-UNet, which integrates a LeViT Transformer module into the U-Net architecture, for fast and accurate medical image segmentation. Specifically, we use LeViT as the encoder of the LeViT-UNet, which better trades off the accuracy and efficiency of the Transformer block. Moreover, multi-scale feature maps from transformer blocks and convolutional blocks of LeViT are passed into the decoder via skip-connection, which can effectively reuse the spatial information of the feature maps. Our experiments indicate that the proposed LeViT-UNet achieves better performance comparing to various competing methods on several challenging medical image segmentation benchmarks including Synapse and ACDC. Code and models will be publicly available at https://github.com/apple1986/LeViT_UNet.

연구 동기 및 목표

트랜스포머 기반 글로벌 컨텍스트와 CNN 로컬 피처를 결합하여 의료 영상 분할 개선의 동기를 제시한다.
가볍게 설계된 LeViT 기반 인코더를 U-넷 스타일 디코더에 통합한다.
트랜스포머와 컨볼루션 피처를 모두 활용하는 다중 스케일 피처 융합 전략을 개발한다.
정확도와 효율성을 평가하기 위해 여러 의료 분할 벤치마크에서 평가한다.

제안 방법

인코더로 LeViT를 사용하여 글로벌 컨텍스트를 추출하고 FLOPs를 감소시킨다.
인코더의 마지막 단계에서 컨볼루션 및 트랜스포머 블록의 다중 스케일 피처를 연결한다.
해상도 회복을 위한 연속 업샘플링과 스키ップ 연결이 있는 CNN 기반 디코더를 유지한다.
초기 매개변수를 ImageNet-1k에서 사전 학습하여 초기화한다.
세 가지 변형 LeViT-UNet-128s, -192, -384를 비교하여 채널 효과와 성능를 연구한다.
트랜스포머 존재 여부, 스킵 연결, 사전 학습에 대한 차별적 실험을 통해 영향력을 이해한다.

실험 결과

연구 질문

RQ1LeViT 기반 인코더가 U-넷 프레임워크에서 실시간 유사한 효율성을 유지하면서 분할 정확도를 향상시킬 수 있는가?
RQ2다중 스케일 트랜스포머와 CNN 피처를 융합하는 것이 글로벌 컨텍스트와 로컬 세부 정보를 모두 향상시키는가?
RQ3트랜스포머 채널 수와 스킵 연결의 수가 분할 성능과 경계 정확도에 어떤 영향을 미치는가?
RQ4LeViT-UNet은 Synapse, ACDC와 같은 표준 의료 데이터셋에서 최신 CNN- 및 트랜스포머 기반 방법과 비교해 어떤 성능을 보이는가?

주요 결과

방법	DSC ↓?	HD ↓?	대동맥	담낭	신장(좌)	신장(우)	간	췌장	비장	위	# 매개변수(M)	FLOPs(G)	FPS
V-Net	68.81	-	75.34	51.87	77.10	80.75	87.84	40.05	80.56	56.98	-	-	-
DARR	69.77	-	74.74	53.77	72.31	73.24	94.08	54.18	89.90	45.96	-	-	-
U-Net	76.85	39.70	89.07	69.72	77.77	68.60	93.43	53.98	86.67	75.58	-	-	-
R50 U-Net	74.68	36.87	87.74	63.66	80.60	78.19	93.74	56.90	85.87	74.16	-	-	-
R50 Att-UNet	75.57	36.97	55.92	63.91	79.20	72.71	93.56	49.37	87.19	74.95	-	-	-
R50-Deeplabv3+	75.73	26.93	86.18	60.42	81.18	75.27	92.86	51.06	88.69	70.19	-	-	-
R50 ViT	71.29	32.87	73.73	55.13	75.80	72.20	91.51	45.99	81.99	73.95	-	-	-
TransUnet	77.48	31.69	87.23	63.13	81.87	77.02	94.08	55.86	85.08	75.62	105.28	24.64	50
SwinUnet	79.13	21.55	85.47	66.53	83.28	79.61	94.29	56.58	90.66	76.60	-	-	-
LeViT-UNet-128s	73.69	23.92	86.45	66.13	79.32	73.56	91.85	49.25	79.29	63.70	15.91	17.55	114
LeViT-UNet-192	74.67	18.86	85.69	57.37	79.08	75.90	92.05	53.53	83.11	70.61	19.90	18.92	95
LeViT-UNet-384	78.53	16.84	87.33	62.23	84.61	80.25	93.11	59.07	88.86	72.76	52.17	25.55	85

LeViT-UNet-384는 Synapse에서 DSC 78.53% 및 HD 16.84 mm를 달성하여 경계 정확도에서 다수의 SOTA 방법을 능가합니다.
Synapse에서 LeViT-UNet 변형들은 기관별로 경쟁력 있는 DSC를 달성하며, LeViT-UNet-384가 보고된 방법들 중 최상의 HD(16.84 mm)를 제공합니다.
LeViT-UNet-384는 ACDC RV 및 LV에서 각각 DSC 90.32, 93.76으로 강력한 심장 분할 성능을 보여줍니다.
트랜스포머 채널 수 증가와 트랜스포머 블록의 포함은 비트랜스포머 기반 기준선에 비해 일관되게 DSC와 HD를 향상시킵니다.
더 많은 스킵 연결은 일반적으로 성능을 향상시키며 특히 대동맥이나 담낭과 같은 작은 기관에서 큰 이득이 나타납니다.
대형 트랜스포머 백본(예: LeViT-UNet-384)은 사전 학습에 도움이 되지만 작은 버전에는 혼합 효과를 보입니다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.