[论文解读] AbdomenAtlas-8K: Annotating 8,000 CT Volumes for Multi-Organ Segmentation in Three Weeks
提出一种主动学习工作流,通过将AI预测与有针对性的放射科医生修订相结合,快速创建 AbdomenAtlas-8K(一个包含多器官的大型CT数据集),在三周内完成全部标注。
Annotating medical images, particularly for organ segmentation, is laborious and time-consuming. For example, annotating an abdominal organ requires an estimated rate of 30-60 minutes per CT volume based on the expertise of an annotator and the size, visibility, and complexity of the organ. Therefore, publicly available datasets for multi-organ segmentation are often limited in data size and organ diversity. This paper proposes an active learning method to expedite the annotation process for organ segmentation and creates the largest multi-organ dataset (by far) with the spleen, liver, kidneys, stomach, gallbladder, pancreas, aorta, and IVC annotated in 8,448 CT volumes, equating to 3.2 million slices. The conventional annotation methods would take an experienced annotator up to 1,600 weeks (or roughly 30.8 years) to complete this task. In contrast, our annotation method has accomplished this task in three weeks (based on an 8-hour workday, five days a week) while maintaining a similar or even better annotation quality. This achievement is attributed to three unique properties of our method: (1) label bias reduction using multiple pre-trained segmentation models, (2) effective error detection in the model predictions, and (3) attention guidance for annotators to make corrections on the most salient errors. Furthermore, we summarize the taxonomy of common errors made by AI algorithms and annotators. This allows for continuous revision of both AI and annotations and significantly reduces the annotation costs required to create large-scale datasets for a wider variety of medical imaging tasks.
研究动机与目标
- 动机:需要大规模、完全标注的腹部CT数据集,以实现跨多样人群和扫描仪的鲁棒多器官分割。
- 开发一种高效的标注工作流,将AI预测与选择性的人类修订结合起来,以大幅减少标注时间。
- 产出 AbdomenAtlas-8K,截至目前最大的腹部CT标注数据集,涵盖八个器官和数千个体积。
- 提供一个框架,通过多模型预测和针对错误的注意力图来评估并减少标签偏差、提升AI泛化能力。
提出的方法
- 在部分标注的公开数据集上训练三种AI分割架构,以生成初始体素级预测。
- 通过综合不一致性、不确定性和预测重叠,计算每个体素的注意力图,以突出潜在错误。
- 采用八步主动学习循环:标注者修订最高优先级的体积,然后对模型进行微调并重复,直到改进达到平台期。
实验结果
研究问题
- RQ1带有面向错误的注意力图的主动学习工作流能否加速大规模、逐体素的腹部器官标注?
- RQ2聚合多模型AI预测如何影响多器官分割的标签偏差和泛化?
- RQ3从传统的体素级标注转向引导型的人机协同过程的实用性与效率提升有哪些?
- RQ4修订后的标注是否改善下游AI分割性能并对未见数据的泛化能力?
主要发现
- AbdomenAtlas-8K 由8,448个CT体积组成,具有八个腹部结构的逐体素标注,三周内完成。
- 基于不一致性、不确定性和重叠的注意力图在外部数据上以高灵敏度和高精度有效定位需要人工修订的区域。
- 最终标注通过对三个AI预测进行平均来降低标签偏差,而非依赖单一模型,从而提升跨模型泛化。
- 在未见数据集上评估时,基于 AbdomenAtlas-8K 训练的AI模型的平均性能可与在私立医院数据上训练的模型相当,表明泛化能力强。
- 修订后的标签和微调在外部验证中提升了各器官的 Dice 相似性系数(DSC)和 NSD,具有可观的改进。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。