Skip to main content
QUICK REVIEW

[论文解读] nnU-Net: Self-adapting Framework for U-Net-Based Medical Image Segmentation

Fabian Isensee, Jens Petersen|arXiv (Cornell University)|Sep 27, 2018
Advanced Neural Network Applications被引用 390
一句话总结

nnU-Net 提出一个围绕普通 U-Nets 构建的自适应、全自动化流程,该流程根据数据集的具体情况调整架构、预处理、训练和推断,在 Medical Segmentation Decathlon 数据集上无需手工调参即可实现顶尖性能。

ABSTRACT

The U-Net was presented in 2015. With its straight-forward and successful architecture it quickly evolved to a commonly used benchmark in medical image segmentation. The adaptation of the U-Net to novel problems, however, comprises several degrees of freedom regarding the exact architecture, preprocessing, training and inference. These choices are not independent of each other and substantially impact the overall performance. The present paper introduces the nnU-Net ('no-new-Net'), which refers to a robust and self-adapting framework on the basis of 2D and 3D vanilla U-Nets. We argue the strong case for taking away superfluous bells and whistles of many proposed network designs and instead focus on the remaining aspects that make out the performance and generalizability of a method. We evaluate the nnU-Net in the context of the Medical Segmentation Decathlon challenge, which measures segmentation performance in ten disciplines comprising distinct entities, image modalities, image geometries and dataset sizes, with no manual adjustments between datasets allowed. At the time of manuscript submission, nnU-Net achieves the highest mean dice scores across all classes and seven phase 1 tasks (except class 1 in BrainTumour) in the online leaderboard of the challenge.

研究动机与目标

  • Show that non-architectural, automatically configured components can outperform specialized architectures on diverse medical segmentation tasks.
  • Introduce a fully automatic pipeline that adapts U-Net topologies, preprocessing, training, inference, and post-processing to each dataset.
  • Evaluate the framework on the Medical Segmentation Decathlon and compare to state-of-the-art benchmarks.

提出的方法

  • Use a bank of three U-Net variants (2D U-Net, 3D U-Net, and U-Net Cascade) with only minor architectural changes to the original U-Net.
  • Automatically adapt input patch sizes, memory usage, and pooling operations to each dataset’s geometry.
  • Define a complete automated pipeline: preprocessing (cropping, resampling, normalization), training (loss: dice+CE, Adam optimizer, learning-rate schedule, cross-validation), inference (patch-based tiling with overlap and test-time augmentation), and post-processing (largest connected component).
  • Employ data augmentation (rotations, scaling, elastic deformations, gamma correction, mirroring) and staged training for the cascade when applicable.
  • Ensemble multiple models (and two-out-of-three combinations) per dataset and select the best-performing model based on cross-validated mean foreground Dice score for submission.

实验结果

研究问题

  • RQ1Can a self-adapting framework using vanilla U-Nets achieve high generalization across heterogeneous medical imaging datasets without manual task-specific tweaks?
  • RQ2How do automated preprocessing, training, inference, and post-processing components influence segmentation performance relative to architectural innovations?
  • RQ3What is the impact of model ensembling and cascade strategies on cross-dataset generalization?

主要发现

  • The nnU-Net framework achieves the highest mean Dice scores across seven phase 1 tasks in the Medical Segmentation Decathlon leaderboard at the time of submission (with one exception in BrainTumour class 1).
  • Automatic adaptation to dataset geometry (input patch size, pooling topology) enables the network to operate effectively across varying image sizes and modalities.
  • A cascade approach (3D U-Net cascade) is applied when beneficial, addressing large image sizes and enabling refinement at full resolution.
  • Concrete training protocol details (loss = dice + cross-entropy, Adam optimizer, learning-rate scheduling) and extensive data augmentation contribute substantially to robustness and performance.
  • Inference uses patch-based tiling with overlap and test-time augmentation, aggregating up to 64 predictions per voxel for robustness.
  • An automatic ensembling strategy (two-model ensembles) and selection of the best performing model on cross-validated metrics drive final submission performance.

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。