[논문 리뷰] Image segmentations produced by BAMF under the AIMI Annotations initiative
이 논문은 RSNA-ASNR-MICCAI BraTS 2021 챌린지 통해 생성된 대규모 다기관 기준 데이터셋을 제시한다. 뇌종양 세분화 및 방사유전체 분류를 위한 것으로, 2,000건의 mpMRI 스캔에 대해 개선된 공감 기반의 표준화된 표기 과정을 거쳐 높은 품질의 세분화 레이블과 MGMT 프로모터 메틸화 상태를 제공하여 신경종양학 분야에서 딥러닝 모델의 강력한 평가를 가능하게 한다.
The Imaging Data Commons (IDC)(https://imaging.datacommons.cancer.gov/) [1] connects researchers with publicly available cancer imaging data, often linked with other types of cancer data. Many of the collections have limited annotations due to the expense and effort required to create these manually. The increased capabilities of AI analysis of radiology images provide an opportunity to augment existing IDC collections with new annotation data. To further this goal, we trained several nnUNet [2] based models for a variety of radiology segmentation tasks from public datasets and used them to generate segmentations for IDC collections. To validate the model's performance, roughly 10% of the AI predictions were assigned to a validation set. For this set, a board-certified radiologist graded the quality of AI predictions on a Likert scale. If they did not 'strongly agree' with the AI output, the reviewer corrected the segmentation. This record provides the AI segmentations, Manually corrected segmentations, and Manual scores for the inspected IDC Collection images. Only 10% of the AI-derived annotations provided in this dataset are verified by expert radiologists . More details, on model training and annotations are provided within the associated manuscript to ensure transparency and reproducibility. This work was done in two stages. Versions 1.x of this record were from the first stage. Versions 2.x added additional records. In the Version 1.x collections, a medical student (non-expert) reviewed all the AI predictions and rated them on a 5-point Likert Scale, for any AI predictions in the validation set that they did not 'strongly agree' with, the non-expert provided corrected segmentations. This non-expert was not utilized for the Version 2.x additional records. Likert Score Definition: Guidelines for reviewers to grade the quality of AI segmentations. 5 Strongly Agree - Use-as-is (i.e., clinically acceptable, and could be used for treatment without change) 4 Agree - Minor edits that are not necessary. Stylistic differences, but not clinically important. The current segmentation is acceptable 3 Neither agree nor disagree - Minor edits that are necessary. Minor edits are those that the review judges can be made in less time than starting from scratch or are expected to have minimal effect on treatment outcome 2 Disagree - Major edits. This category indicates that the necessary edit is required to ensure correctness, and sufficiently significant that user would prefer to start from the scratch 1 Strongly disagree - Unusable. This category indicates that the quality of the automatic annotations is so bad that they are unusable. Zip File Folder Structure Each zip file in the collection correlates to a specific segmentation task. The common folder structure is ai-segmentations-dcm This directory contains the AI model predictions in DICOM-SEG format for all analyzed IDC collection files qa-segmentations-dcm This directory contains manual corrected segmentation files, based on the AI prediction, in DICOM-SEG format. Only a fraction, ~10%, of the AI predictions were corrected. Corrections were performed by radiologist (rad*) and non-experts (ne*) qa-results.csv CSV file linking the study/series UIDs with the ai segmentation file, radiologist corrected segmentation file, radiologist ratings of AI performance. qa-results.csv Columns The qa-results.csv file contains metadata about the segmentations, their related IDC case image, as well as the Likert ratings and comments by the reviewers. Column Description Collection The name of the IDC collection for this case PatientID PatientID in DICOM metadata of scan. Also called Case ID in the IDC StudyInstanceUID StudyInstanceUID in the DICOM metadata of the scan SeriesInstanceUID SeriesInstanceUID in the DICOM metadata of the scan Validation true/false if this scan was manually reviewed Reviewer Coded ID of the reviewer. Radiologist IDs start with ‘rad’ non-expect IDs start with ‘ne’ AimiProjectYear 2023 or 2024, This work was split over two years. The main methodology difference between the two is that in 2023, a non-expert also reviewed the AI output, but a non-expert was not utilized in 2024. AISegmentation The filename of the AI prediction file in DICOM-seg format. This file is in the ai-segmentations-dcm folder. CorrectedSegmentation The filename of the reviewer-corrected prediction file in DICOM-seg format. This file is in the qa-segmentations-dcm folder. If the reviewer strongly agreed with the AI for all segments, they did not provide any correction file. Was the AI predicted ROIs accurate? This column appears one for each segment in the task for images from AimiProjectYear 2023. The reviewer rates segmentation quality on a Likert scale. In tasks that have multiple labels in the output, there is only one rating to cover them all. Was the AI predicted {SEGMENT_NAME} label accurate? This column appears one for each segment in the task for images from AimiProjectYear 2024. The reviewer rates each segment for its quality on a Likert scale. Do you have any comments about the AI predicted ROIs? Open ended question for the reviewer Do you have any comments about the findings from the study scans? Open ended question for the reviewer File Overview brain-mr.zip Segment Description: brain tumor regions: necrosis, edema, enhancing IDC Collection: UPENN-GBM Links: model weights, github breast-fdg-pet-ct.zip Segment Description: FDG-avid lesions in breast from FDG PET/CT scans QIN-Breast IDC Collection: QIN-Breast Links: model weights, github breast-mr.zip Segment Description: Breast, Fibroglandular tissue, structural tumor IDC Collection: duke-breast-cancer-mri Links: model weights, github kidney-ct.zip Segment Description: Kidney, Tumor, and Cysts from contrast enhanced CT scans IDS Collection: TCGA-KIRC, TCGA-KIRP, TCGA-KICH, CPTAC-CCRCC Links: model weights, github liver-ct.zip Segment Description: Liver from CT scans IDC Collection: TCGA-LIHC Links: model weights, github liver2-ct.zip Segment Description: Liver and Lesions from CT scans IDC Collection: HCC-TACE-SEG, COLORECTAL-LIVER-METASTASES Links: model weights, github liver-mr.zip Segment Description: Liver from T1 MRI scans IDC Collection: TCGA-LIHC Links: model weights, github lung-ct.zip Segment Description: Lung and Nodules (3mm-30mm) from CT scans IDC Collections: Anti-PD-1-Lung LUNG-PET-CT-Dx NSCLC Radiogenomics RIDER Lung PET-CT TCGA-LUAD TCGA-LUSC Links: model weights 1, model weights 2, github lung2-ct.zip Improved model version Segment Description: Lung and Nodules (3mm-30mm) from CT scans IDC Collections: QIN-LUNG-CT, SPIE-AAPM Lung CT Challenge Links: model weights, github lung-fdg-pet-ct.zip Segment Description: Lungs and FDG-avid lesions in the lung from FDG PET/CT scans IDC Collections: ACRIN-NSCLC-FDG-PET Anti-PD-1-Lung LUNG-PET-CT-Dx NSCLC Radiogenomics RIDER Lung PET-CT TCGA-LUAD TCGA-LUSC Links: model weights, github prostate-mr.zip Segment Description: Prostate from T2 MRI scans IDC Collection: ProstateX, Prostate-MRI-US-Biopsy Links: model weights, github Changelog 2.0.2 - Fix the brain-mr segmentations to be transformed correctly 2.0.1 - added AIMI 2024 radiologist comments to qa-results.csv 2.0.0 - added AIMI 2024 segmentations 1.X - AIMI 2023 segmentations and reviewer scores
연구 동기 및 목표
- 뇌종양 세분화 및 방사유전체 분류를 위한 표준화되고 고품질의 기준 데이터셋을 구축하기 위해.
- 임상적으로 관련성이 있는 표기 정보를 갖춘 다기관, 다스캐너 mpMRI 데이터에서 딥러닝 모델의 평가를 가능하게 하기 위해.
- 임상 및 연구 환경에서 간질성 종양 분석을 위한 일반화 가능하고 견고한 인공지능 도구의 개발을 지원하기 위해.
- 이전 데이터셋의 한계를 보완하기 위해 반복적 전문가 표기 및 공감 승인 프로세스를 통합함으로써.
- 향후 간질성 종양을 넘어 수술 후 스캔 및 광범위한 뇌 병변을 포함하는 확장 기반을 마련하기 위해.
제안 방법
- 15개국의 59개 기관에서 온 85명의 전문가가 참여한 다기관 협업.
- 표기 품질을 확보하기 위해 단일 표기자와 다수 승인자가 참여하는 반복적 표기 프로세스.
- 세분화 및 분류 모델의 안전하고 익명화된 평가를 위해 Sage Bionetworks Synapse 및 Kaggle 플랫폼 활용.
- 기관 간 방법론의 다양성으로 인해 MGMT 프로모터 메틸화 상태를 이진 분류로 표준화.
- 다른 평가자 간의 변동성을 최소화하고 레이블 신뢰도를 향상시키기 위해 공감 기반의 정제 과정 적용.
- 백질질환과 같은 비간질성 병변을 제외하고 간질성 종양에 집중한 데이터 정제.
실험 결과
연구 질문
- RQ1다중 기관 뇌종양 세분화 데이터셋을 어떻게 체계적으로 정제하여 고수준의 평가자 간 일致성(일致성)을 확보할 수 있는가?
- RQ2공감 기반 표기 과정이 신경영상 분야의 딥러닝 모델 품질과 일반화 가능성에 어떤 영향을 미치는가?
- RQ3세분화 및 방사유전체 레이블을 모두 포함한 표준화된 기준 데이터셋은 모델 성능 향상과 임상적 관련성 향상에 기여하는가?
- RQ4데이터 사각지대와 개인정보 보호 규정을 어떻게 극복하여 신경영상 분야에서 대규모 다기관 데이터 공유를 가능하게 할 수 있는가?
- RQ5현재 데이터셋이 다양한 병변과 치료 반응을 충분히 반영하지 못하는 한계는 무엇이며, 향후 기준 데이터셋은 어떻게 발전시켜야 하는가?
주요 결과
- BraTS 2021 데이터셋은 전문가가 표기한 종양 하위 영역(강화, 비강화, 괴사, 부종)과 MGMT 프로모터 메틸화 상태를 포함한 2,000건의 mpMRI 스캔을 포함한다.
- 반복적 정제와 공감 승인을 통해 높은 수준의 레이블 품질을 달성했지만, 평가자 간 일致성은 공식적으로 측정되지 않았다.
- Sage Bionetworks와 Kaggle 경로를 통해 공개된 데이터셋은 인공지능 모델의 재현 가능한 평가를 지원한다.
- 최고 성능을 낸 참가자들에게 총 60,000달러의 상금이 지급되어 고품질 모델 개발을 유도했다.
- 이 데이터셋은 향후 수술 후 스캔, 절제 부위 및 다질환 세분화를 포함한 확장 기반으로 활용될 수 있다.
- 저자들은 중앙집중형 학습에서 피어드레이티드 학습 방식으로의 전환을 주장하여 다기관 연구에서의 데이터 개인정보 보호 및 접근 장벽을 극복하고자 한다.
더 나은 연구,지금 바로 시작하세요
연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.
카드 등록 없음 · 무료 플랜 제공
이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.