QUICK REVIEW

[논문 리뷰] STU-Net: Scalable and Transferable Medical Image Segmentation Models Empowered by Large-Scale Supervised Pre-training

Ziyan Huang, Haoyu Wang|arXiv (Cornell University)|2023. 04. 13.

COVID-19 diagnosis using AI인용 수 49

한 줄 요약

STU-Net은 TotalSegmentator에서 사전 학습된 최대 1.4B 파라미터의 확장 가능한 U-Net 변형을 도입하고, 14개의 다운스트림 데이터셋 및 파인 튜닝 시나리오에 대한 강한 전이 가능성을 입증합니다.

ABSTRACT

Large-scale models pre-trained on large-scale datasets have profoundly advanced the development of deep learning. However, the state-of-the-art models for medical image segmentation are still small-scale, with their parameters only in the tens of millions. Further scaling them up to higher orders of magnitude is rarely explored. An overarching goal of exploring large-scale models is to train them on large-scale medical segmentation datasets for better transfer capacities. In this work, we design a series of Scalable and Transferable U-Net (STU-Net) models, with parameter sizes ranging from 14 million to 1.4 billion. Notably, the 1.4B STU-Net is the largest medical image segmentation model to date. Our STU-Net is based on nnU-Net framework due to its popularity and impressive performance. We first refine the default convolutional blocks in nnU-Net to make them scalable. Then, we empirically evaluate different scaling combinations of network depth and width, discovering that it is optimal to scale model depth and width together. We train our scalable STU-Net models on a large-scale TotalSegmentator dataset and find that increasing model size brings a stronger performance gain. This observation reveals that a large model is promising in medical image segmentation. Furthermore, we evaluate the transferability of our model on 14 downstream datasets for direct inference and 3 datasets for further fine-tuning, covering various modalities and segmentation targets. We observe good performance of our pre-trained model in both direct inference and fine-tuning. The code and pre-trained models are available at https://github.com/Ziyan-Huang/STU-Net.

연구 동기 및 목표

다양한 모달리티와 타깃을 처리할 수 있는 확장 가능하고 전이 가능한 의학 영상 분할 모델의 개발 동기 부여.
nnU-Net를 개선하여 확장성과 전이성을 높인 STU-Net 변형 개발.
다운스트림 작업으로의 전이를 강화하기 위한 대규모 의학 분할 데이터셋에서의 사전 학습.
다양한 데이터셋과 모달리티에 대해 직접 추론 및 파인 튜닝 전이 가능성 평가.

제안 방법

잔차 연결이 있는 nnU-Net 블록으로 더 깊은 아키텍처를 가능하게 개선.
전이 가능성을 위해 트랜스포즈 기반 업샘플링을 가중치 없이 보간과 1x1x1 컨볼루션으로 대체.
작업 간 전이 가능성을 유지하기 위해 아키텍처 하이퍼파라미터(예: 단계 수, 등방성 커널) 고정.
깊이와 너비를 복합적으로 확장하여 STU-Net-S, STU-Net-B, STU-Net-L, STU-Net-H를 파라미터 증가와 함께 생성.
TotalSegmentator CT 데이터셋(104개 기관, 1204 부피)에 대해 mirror 증강으로 4000 에폭 사전 학습.
다운스트림 데이터셋에 대해 필요 시 채널 적응을 수행하며 파인 튜닝 또는 직접 추론.

실험 결과

연구 질문

RQ1대규모 의학 분할 데이터에서 깊이와 너비를 함께 확장하여 STU-Net이 확장 가능한 성능 향상을 달성할 수 있는가?
RQ2가중치 없이 보간을 통한 작업 특정 업샘플링 제거가 모달리티 및 작업 간 전이 가능성을 개선하는가?
RQ3TotalSegmentator에서의 대규모 감독 사전 학습이 다양한 다운스트림 데이터셋의 전이 성능에 어떤 영향을 주는가?
RQ4다수의 CT/MR/PET 데이터셋에서 직접 추론과 파인 튜닝 간의 전이 효율성 차이는 무엇인가?

주요 결과

STU-Net-H(깊이 3배, 너비 3배)는 1.4B 파라미터에 도달하고 TotalSegmentator 클래스에서 가장 높은 평균 Dice 유사도(DSC)를 달성합니다.
STU-Net-B는 TotalSegmentator에서 nnU-Net 및 SwinUNETR-B를 능가하는 평균 DSC를 보여주며, STU-Net-L 및 STU-Net-H로 확장될수록 더 큰 이점을 보입니다.
사전 학습된 STU-Net 모델은 직접 추론에서 14개의 다운스트림 CT 데이터 세트에 효과적으로 전달되며, 일반적으로 큰 모델일수록 더 높은 평균 DSC를 달성합니다.
세 개의 다운스트림 데이터세트(AutoPET 포함)에서 STU-Net-H-ft를 파인튜닝하면 최상의 평균 DSC를 얻고 베이스라인을 상회합니다.
아키텍처 개선(잔차 블록, 가중치 없는 업샘플링)과 복합 스케일링은 유사한 계산 자원 하에서 nnU-Net 변형을 지속적으로 능가합니다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.