[论文解读] Image segmentations produced by BAMF under the AIMI Annotations initiative
本论文介绍了针对脑肿瘤分割与放射基因组分类的大型多中心基准数据集,该数据集由RSNA-ASNR-MICCAI BraTS 2021挑战赛生成。该研究提出了一种经过优化的共识性标注流程,对2,000例多参数MRI(mpMRI)扫描进行标注,生成高质量的分割标签和MGMT启动子甲基化状态信息,从而支持在神经肿瘤学领域对深度学习模型进行稳健评估。
The Imaging Data Commons (IDC)(https://imaging.datacommons.cancer.gov/) [1] connects researchers with publicly available cancer imaging data, often linked with other types of cancer data. Many of the collections have limited annotations due to the expense and effort required to create these manually. The increased capabilities of AI analysis of radiology images provide an opportunity to augment existing IDC collections with new annotation data. To further this goal, we trained several nnUNet [2] based models for a variety of radiology segmentation tasks from public datasets and used them to generate segmentations for IDC collections. To validate the model's performance, roughly 10% of the AI predictions were assigned to a validation set. For this set, a board-certified radiologist graded the quality of AI predictions on a Likert scale. If they did not 'strongly agree' with the AI output, the reviewer corrected the segmentation. This record provides the AI segmentations, Manually corrected segmentations, and Manual scores for the inspected IDC Collection images. Only 10% of the AI-derived annotations provided in this dataset are verified by expert radiologists . More details, on model training and annotations are provided within the associated manuscript to ensure transparency and reproducibility. This work was done in two stages. Versions 1.x of this record were from the first stage. Versions 2.x added additional records. In the Version 1.x collections, a medical student (non-expert) reviewed all the AI predictions and rated them on a 5-point Likert Scale, for any AI predictions in the validation set that they did not 'strongly agree' with, the non-expert provided corrected segmentations. This non-expert was not utilized for the Version 2.x additional records. Likert Score Definition: Guidelines for reviewers to grade the quality of AI segmentations. 5 Strongly Agree - Use-as-is (i.e., clinically acceptable, and could be used for treatment without change) 4 Agree - Minor edits that are not necessary. Stylistic differences, but not clinically important. The current segmentation is acceptable 3 Neither agree nor disagree - Minor edits that are necessary. Minor edits are those that the review judges can be made in less time than starting from scratch or are expected to have minimal effect on treatment outcome 2 Disagree - Major edits. This category indicates that the necessary edit is required to ensure correctness, and sufficiently significant that user would prefer to start from the scratch 1 Strongly disagree - Unusable. This category indicates that the quality of the automatic annotations is so bad that they are unusable. Zip File Folder Structure Each zip file in the collection correlates to a specific segmentation task. The common folder structure is ai-segmentations-dcm This directory contains the AI model predictions in DICOM-SEG format for all analyzed IDC collection files qa-segmentations-dcm This directory contains manual corrected segmentation files, based on the AI prediction, in DICOM-SEG format. Only a fraction, ~10%, of the AI predictions were corrected. Corrections were performed by radiologist (rad*) and non-experts (ne*) qa-results.csv CSV file linking the study/series UIDs with the ai segmentation file, radiologist corrected segmentation file, radiologist ratings of AI performance. qa-results.csv Columns The qa-results.csv file contains metadata about the segmentations, their related IDC case image, as well as the Likert ratings and comments by the reviewers. Column Description Collection The name of the IDC collection for this case PatientID PatientID in DICOM metadata of scan. Also called Case ID in the IDC StudyInstanceUID StudyInstanceUID in the DICOM metadata of the scan SeriesInstanceUID SeriesInstanceUID in the DICOM metadata of the scan Validation true/false if this scan was manually reviewed Reviewer Coded ID of the reviewer. Radiologist IDs start with ‘rad’ non-expect IDs start with ‘ne’ AimiProjectYear 2023 or 2024, This work was split over two years. The main methodology difference between the two is that in 2023, a non-expert also reviewed the AI output, but a non-expert was not utilized in 2024. AISegmentation The filename of the AI prediction file in DICOM-seg format. This file is in the ai-segmentations-dcm folder. CorrectedSegmentation The filename of the reviewer-corrected prediction file in DICOM-seg format. This file is in the qa-segmentations-dcm folder. If the reviewer strongly agreed with the AI for all segments, they did not provide any correction file. Was the AI predicted ROIs accurate? This column appears one for each segment in the task for images from AimiProjectYear 2023. The reviewer rates segmentation quality on a Likert scale. In tasks that have multiple labels in the output, there is only one rating to cover them all. Was the AI predicted {SEGMENT_NAME} label accurate? This column appears one for each segment in the task for images from AimiProjectYear 2024. The reviewer rates each segment for its quality on a Likert scale. Do you have any comments about the AI predicted ROIs? Open ended question for the reviewer Do you have any comments about the findings from the study scans? Open ended question for the reviewer File Overview brain-mr.zip Segment Description: brain tumor regions: necrosis, edema, enhancing IDC Collection: UPENN-GBM Links: model weights, github breast-fdg-pet-ct.zip Segment Description: FDG-avid lesions in breast from FDG PET/CT scans QIN-Breast IDC Collection: QIN-Breast Links: model weights, github breast-mr.zip Segment Description: Breast, Fibroglandular tissue, structural tumor IDC Collection: duke-breast-cancer-mri Links: model weights, github kidney-ct.zip Segment Description: Kidney, Tumor, and Cysts from contrast enhanced CT scans IDS Collection: TCGA-KIRC, TCGA-KIRP, TCGA-KICH, CPTAC-CCRCC Links: model weights, github liver-ct.zip Segment Description: Liver from CT scans IDC Collection: TCGA-LIHC Links: model weights, github liver2-ct.zip Segment Description: Liver and Lesions from CT scans IDC Collection: HCC-TACE-SEG, COLORECTAL-LIVER-METASTASES Links: model weights, github liver-mr.zip Segment Description: Liver from T1 MRI scans IDC Collection: TCGA-LIHC Links: model weights, github lung-ct.zip Segment Description: Lung and Nodules (3mm-30mm) from CT scans IDC Collections: Anti-PD-1-Lung LUNG-PET-CT-Dx NSCLC Radiogenomics RIDER Lung PET-CT TCGA-LUAD TCGA-LUSC Links: model weights 1, model weights 2, github lung2-ct.zip Improved model version Segment Description: Lung and Nodules (3mm-30mm) from CT scans IDC Collections: QIN-LUNG-CT, SPIE-AAPM Lung CT Challenge Links: model weights, github lung-fdg-pet-ct.zip Segment Description: Lungs and FDG-avid lesions in the lung from FDG PET/CT scans IDC Collections: ACRIN-NSCLC-FDG-PET Anti-PD-1-Lung LUNG-PET-CT-Dx NSCLC Radiogenomics RIDER Lung PET-CT TCGA-LUAD TCGA-LUSC Links: model weights, github prostate-mr.zip Segment Description: Prostate from T2 MRI scans IDC Collection: ProstateX, Prostate-MRI-US-Biopsy Links: model weights, github Changelog 2.0.2 - Fix the brain-mr segmentations to be transformed correctly 2.0.1 - added AIMI 2024 radiologist comments to qa-results.csv 2.0.0 - added AIMI 2024 segmentations 1.X - AIMI 2023 segmentations and reviewer scores
研究动机与目标
- 建立一个标准化的、高质量的脑肿瘤分割与放射基因组分类基准数据集。
- 实现对多中心、多扫描仪mpMRI数据中具有临床相关性的标注进行深度学习模型评估。
- 支持在临床与研究环境中开发可泛化、鲁棒的胶质瘤分析人工智能工具。
- 通过引入迭代专家标注与共识审批流程,弥补以往数据集的局限性。
- 为未来扩展至胶质瘤以外的病种(如术后扫描与更广泛的脑部病理)奠定基础。
提出的方法
- 由来自15个国家59家机构的85名专家组成的多机构合作。
- 采用迭代标注流程,由单名标注者与多名审批者共同确保标签质量。
- 利用Sage Bionetworks Synapse与Kaggle平台进行安全、盲评的分割与分类模型评估。
- 由于各机构方法学差异,将MGMT启动子甲基化状态标准化为二分类任务。
- 应用基于共识的优化方法,以最小化评分者间差异并提升标签可靠性。
- 数据整理聚焦于胶质瘤,排除非胶质瘤异常(如白质高信号)。
实验结果
研究问题
- RQ1如何系统性地整理一个大规模、多中心的脑肿瘤分割数据集,以实现高评分者间一致性?
- RQ2基于共识的标注对神经影像中深度学习模型的质量与泛化能力有何影响?
- RQ3一个同时包含分割与放射基因组标签的标准化基准是否能提升模型性能与临床相关性?
- RQ4如何在数据孤岛与隐私法规限制下,实现神经影像领域的大规模、多机构数据共享?
- RQ5当前数据集在捕捉多样化病理与治疗反应方面存在哪些局限?未来基准应如何演进以应对这些挑战?
主要发现
- BraTS 2021数据集包含2,000例mpMRI扫描,附有专家标注的肿瘤亚区(增强、非增强、坏死、水肿)及MGMT启动子甲基化状态。
- 通过迭代优化与共识审批流程,实现了高水平的标签质量,尽管未正式测量评分者间一致性。
- 该数据集通过Sage Bionetworks与Kaggle平台公开获取,支持人工智能模型的可重复评估。
- 向表现优异的参赛者发放总计60,000美元的奖金,激励高质量模型的开发。
- 该数据集为未来扩展至术后扫描、切除腔与多疾病分割提供了基础。
- 作者倡导从集中式学习向联邦学习范式转变,以克服多机构研究中数据隐私与访问障碍。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。