QUICK REVIEW

[논문 리뷰] Transformer-Unet: Raw Image Processing with Unet

Youyang Sha, Yonghong Zhang|arXiv (Cornell University)|2021. 09. 17.

Advanced Neural Network Applications참고 문헌 29인용 수 25

한 줄 요약

본 논문은 Transformer 모듈이 Unet 스타일 디코더로 피드되는 원시 이미지를 처리하는 네트워크인 TUnet을 제시하며, Unet, Attention Unet, TransUnet과 비교하여 CT82 췌장 데이터에서 더 우수한 분할 성능을 달성한다.

ABSTRACT

Medical image segmentation have drawn massive attention as it is important in biomedical image analysis. Good segmentation results can assist doctors with their judgement and further improve patients' experience. Among many available pipelines in medical image analysis, Unet is one of the most popular neural networks as it keeps raw features by adding concatenation between encoder and decoder, which makes it still widely used in industrial field. In the mean time, as a popular model which dominates natural language process tasks, transformer is now introduced to computer vision tasks and have seen promising results in object detection, image classification and semantic segmentation tasks. Therefore, the combination of transformer and Unet is supposed to be more efficient than both methods working individually. In this article, we propose Transformer-Unet by adding transformer modules in raw images instead of feature maps in Unet and test our network in CT82 datasets for Pancreas segmentation accordingly. We form an end-to-end network and gain segmentation results better than many previous Unet based algorithms in our experiment. We demonstrate our network and show our experimental results in this paper accordingly.

연구 동기 및 목표

Transformer 구성요소를 Unet 프레임워크에 통합하여 원시 이미지를 직접 처리함으로써 의료 영상 분할 성능 향상을 촉진한다.
고해상도 CT 슬라이스에서 분할 정확도를 높이기 위해 트랜스포머의 전역 관계 모델링과 Unet의 로컬 특징 추출을 결합한다.

제안 방법

원시 이미지를 패치 시퀀스로 표현하고 이 시퀀스에 ViT 유사 트랜스포머를 적용한다.
패치를 1x1 컨볼루션으로 임베딩하고 학습 가능한 위치 임베딩을 더한 뒤 LayerNorm이 있는 다중 자기 주의 및 MLP 층을 적용한다.
거의 대칭적인 Unet 인코더-디코더 구조를 사용하고 트랜스포머 출력을 다중 스케일 특징의 연결(concatenation)을 통해 디코더로 입력한다.
트랜스포머 출력을 Unet 디코더 입력에 맞게 재구성하여 연결하고 원래 해상도로의 최종 바일리니어 업샘플링을 수행한다.
픽셀 단위 분할에 대해 BCE 손실로 엔드-투-엔드 학습한다.

실험 결과

연구 질문

RQ1피처 맵 기반 트랜스포머 방식과 비교하여 원시 CT 슬라이스를 트랜스포머로 처리하면 분할 성능이 향상될 수 있는가?
RQ2원시 이미지에 트랜스포머를 직접 통합하고 Unet 디코더를 결합하는 것이 췌장 분할에서 Unet, Attention Unet, TransUnet보다 더 나은 성능을 보이는가?
RQ3패치 크기와 Unet 백본 깊이가 TUnet의 성능과 효율성에 어떻게 영향을 미치는가?

주요 결과

CT82 췌장 분할에서 평가된 모델들 중 TUnet이 가장 높은 mIOU와 Dice 점수를 달성: mIOU 0.8301, Dice 0.7966.
TUnet은 Unet( mIOU 0.8113, Dice 0.7689 ) 및 Attention Unet( mIOU 0.8172, Dice 0.7777 ) 및 TransUnet( mIOU 0.7882, Dice 0.7330 )를 모두 능가한다.
TUnet은 강력한 분할 성능과 우수한 픽셀 정확도(0.9983) 및 비교적 높은 재현율(0.7676)을 제공한다.
TUnet의 모델 크기와 추론 시간은 Unet/Attention Unet보다 다소 크지만 여전히 실용적이다(매개변수 약 548.6MB; 추론 약 0.041초).
16x16 패치에서 최적의 결과가 관찰되었고 더 큰 패치는 성능과 효율성을 저하시켰으며 더 깊은 Unet 백본이 트랜스포머 통합에 이점을 준다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.