QUICK REVIEW

[論文レビュー] AbdomenAtlas-8K: Annotating 8,000 CT Volumes for Multi-Organ Segmentation in Three Weeks

Chongyu Qu, Tiezheng Zhang|arXiv (Cornell University)|May 16, 2023

Pancreatic and Hepatic Oncology Research被引用数 17

ひとこと要約

Proposes an active learning workflow to rapidly create AbdomenAtlas-8K, a large multi-organ CT dataset, by combining AI prediction with targeted radiologist revisions, achieving full annotation in three weeks.

ABSTRACT

Annotating medical images, particularly for organ segmentation, is laborious and time-consuming. For example, annotating an abdominal organ requires an estimated rate of 30-60 minutes per CT volume based on the expertise of an annotator and the size, visibility, and complexity of the organ. Therefore, publicly available datasets for multi-organ segmentation are often limited in data size and organ diversity. This paper proposes an active learning method to expedite the annotation process for organ segmentation and creates the largest multi-organ dataset (by far) with the spleen, liver, kidneys, stomach, gallbladder, pancreas, aorta, and IVC annotated in 8,448 CT volumes, equating to 3.2 million slices. The conventional annotation methods would take an experienced annotator up to 1,600 weeks (or roughly 30.8 years) to complete this task. In contrast, our annotation method has accomplished this task in three weeks (based on an 8-hour workday, five days a week) while maintaining a similar or even better annotation quality. This achievement is attributed to three unique properties of our method: (1) label bias reduction using multiple pre-trained segmentation models, (2) effective error detection in the model predictions, and (3) attention guidance for annotators to make corrections on the most salient errors. Furthermore, we summarize the taxonomy of common errors made by AI algorithms and annotators. This allows for continuous revision of both AI and annotations and significantly reduces the annotation costs required to create large-scale datasets for a wider variety of medical imaging tasks.

研究の動機と目的

Motivate the need for large-scale, fully annotated abdominal CT datasets for robust multi-organ segmentation across diverse populations and scanners.
Develop an efficient annotation workflow that couples AI predictions with selective human revision to dramatically cut annotation time.
Produce AbdomenAtlas-8K, the largest annotated abdominal CT dataset to date, with eight organs across thousands of volumes.
Provide a framework to assess and reduce label bias and improve AI generalization through multi-model predictions and error-focused attention maps.

提案手法

Train three AI segmentation architectures on partially labeled public datasets to generate initial voxel-wise predictions.
Compute attention maps for each voxel by combining inconsistency, uncertainty, and predicted overlap to highlight potential errors.
Use an eight-step active learning loop where annotators revise top-priority volumes, followed by model fine-tuning and repetition until improvements plateau.

実験結果

リサーチクエスチョン

RQ1Can an active-learning workflow with error-focused attention maps accelerate large-scale, per-voxel abdominal organ annotation?
RQ2How does aggregating multiple AI model predictions affect label bias and generalization for multi-organ segmentation?
RQ3What is the practicality and efficiency gain in moving from conventional voxel-level annotation to a guided, human-in-the-loop process?
RQ4Do the revised annotations improve downstream AI segmentation performance and generalization to unseen data?

主な発見

AbdomenAtlas-8K comprises 8,448 CT volumes with per-voxel annotations for eight abdominal structures, created in three weeks.
Attention maps based on inconsistency, uncertainty, and overlap effectively localize regions needing human revision with high sensitivity and precision on external data.
The final annotations reduce label bias by averaging three AI predictions instead of relying on a single model, improving cross-model generalization.
AI models trained on AbdomenAtlas-8K achieve comparable average performance to models trained on private hospital data when evaluated on an unseen dataset, indicating strong generalization.
Revised labels and fine-tuning yield measurable improvements in Dice Similarity Coefficient and NSD across organs in external validation.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。