[論文レビュー] Image segmentations produced by BAMF under the AIMI Annotations initiative
本論文は、RSNA-ASNR-MICCAI BraTS 2021 チャレンジを通じて、脳腫瘍画像分類と放射線ゲノム分類のための大規模かつ多施設にまたがるベンチマークデータセットを提示する。2,000例のmpMRIスキャンについて、洗練されたコンSENSUSベースのアノテーションプロセスを実施し、高品質なセグメンテーションラベルおよびMGMTプロモーターのメチル化状態を生成することで、神経腫瘍学分野におけるディーブラーニングモデルの堅牢な評価を可能にする。
The Imaging Data Commons (IDC)(https://imaging.datacommons.cancer.gov/) [1] connects researchers with publicly available cancer imaging data, often linked with other types of cancer data. Many of the collections have limited annotations due to the expense and effort required to create these manually. The increased capabilities of AI analysis of radiology images provide an opportunity to augment existing IDC collections with new annotation data. To further this goal, we trained several nnUNet [2] based models for a variety of radiology segmentation tasks from public datasets and used them to generate segmentations for IDC collections. To validate the model's performance, roughly 10% of the AI predictions were assigned to a validation set. For this set, a board-certified radiologist graded the quality of AI predictions on a Likert scale. If they did not 'strongly agree' with the AI output, the reviewer corrected the segmentation. This record provides the AI segmentations, Manually corrected segmentations, and Manual scores for the inspected IDC Collection images. Only 10% of the AI-derived annotations provided in this dataset are verified by expert radiologists . More details, on model training and annotations are provided within the associated manuscript to ensure transparency and reproducibility. This work was done in two stages. Versions 1.x of this record were from the first stage. Versions 2.x added additional records. In the Version 1.x collections, a medical student (non-expert) reviewed all the AI predictions and rated them on a 5-point Likert Scale, for any AI predictions in the validation set that they did not 'strongly agree' with, the non-expert provided corrected segmentations. This non-expert was not utilized for the Version 2.x additional records. Likert Score Definition: Guidelines for reviewers to grade the quality of AI segmentations. 5 Strongly Agree - Use-as-is (i.e., clinically acceptable, and could be used for treatment without change) 4 Agree - Minor edits that are not necessary. Stylistic differences, but not clinically important. The current segmentation is acceptable 3 Neither agree nor disagree - Minor edits that are necessary. Minor edits are those that the review judges can be made in less time than starting from scratch or are expected to have minimal effect on treatment outcome 2 Disagree - Major edits. This category indicates that the necessary edit is required to ensure correctness, and sufficiently significant that user would prefer to start from the scratch 1 Strongly disagree - Unusable. This category indicates that the quality of the automatic annotations is so bad that they are unusable. Zip File Folder Structure Each zip file in the collection correlates to a specific segmentation task. The common folder structure is ai-segmentations-dcm This directory contains the AI model predictions in DICOM-SEG format for all analyzed IDC collection files qa-segmentations-dcm This directory contains manual corrected segmentation files, based on the AI prediction, in DICOM-SEG format. Only a fraction, ~10%, of the AI predictions were corrected. Corrections were performed by radiologist (rad*) and non-experts (ne*) qa-results.csv CSV file linking the study/series UIDs with the ai segmentation file, radiologist corrected segmentation file, radiologist ratings of AI performance. qa-results.csv Columns The qa-results.csv file contains metadata about the segmentations, their related IDC case image, as well as the Likert ratings and comments by the reviewers. Column Description Collection The name of the IDC collection for this case PatientID PatientID in DICOM metadata of scan. Also called Case ID in the IDC StudyInstanceUID StudyInstanceUID in the DICOM metadata of the scan SeriesInstanceUID SeriesInstanceUID in the DICOM metadata of the scan Validation true/false if this scan was manually reviewed Reviewer Coded ID of the reviewer. Radiologist IDs start with ‘rad’ non-expect IDs start with ‘ne’ AimiProjectYear 2023 or 2024, This work was split over two years. The main methodology difference between the two is that in 2023, a non-expert also reviewed the AI output, but a non-expert was not utilized in 2024. AISegmentation The filename of the AI prediction file in DICOM-seg format. This file is in the ai-segmentations-dcm folder. CorrectedSegmentation The filename of the reviewer-corrected prediction file in DICOM-seg format. This file is in the qa-segmentations-dcm folder. If the reviewer strongly agreed with the AI for all segments, they did not provide any correction file. Was the AI predicted ROIs accurate? This column appears one for each segment in the task for images from AimiProjectYear 2023. The reviewer rates segmentation quality on a Likert scale. In tasks that have multiple labels in the output, there is only one rating to cover them all. Was the AI predicted {SEGMENT_NAME} label accurate? This column appears one for each segment in the task for images from AimiProjectYear 2024. The reviewer rates each segment for its quality on a Likert scale. Do you have any comments about the AI predicted ROIs? Open ended question for the reviewer Do you have any comments about the findings from the study scans? Open ended question for the reviewer File Overview brain-mr.zip Segment Description: brain tumor regions: necrosis, edema, enhancing IDC Collection: UPENN-GBM Links: model weights, github breast-fdg-pet-ct.zip Segment Description: FDG-avid lesions in breast from FDG PET/CT scans QIN-Breast IDC Collection: QIN-Breast Links: model weights, github breast-mr.zip Segment Description: Breast, Fibroglandular tissue, structural tumor IDC Collection: duke-breast-cancer-mri Links: model weights, github kidney-ct.zip Segment Description: Kidney, Tumor, and Cysts from contrast enhanced CT scans IDS Collection: TCGA-KIRC, TCGA-KIRP, TCGA-KICH, CPTAC-CCRCC Links: model weights, github liver-ct.zip Segment Description: Liver from CT scans IDC Collection: TCGA-LIHC Links: model weights, github liver2-ct.zip Segment Description: Liver and Lesions from CT scans IDC Collection: HCC-TACE-SEG, COLORECTAL-LIVER-METASTASES Links: model weights, github liver-mr.zip Segment Description: Liver from T1 MRI scans IDC Collection: TCGA-LIHC Links: model weights, github lung-ct.zip Segment Description: Lung and Nodules (3mm-30mm) from CT scans IDC Collections: Anti-PD-1-Lung LUNG-PET-CT-Dx NSCLC Radiogenomics RIDER Lung PET-CT TCGA-LUAD TCGA-LUSC Links: model weights 1, model weights 2, github lung2-ct.zip Improved model version Segment Description: Lung and Nodules (3mm-30mm) from CT scans IDC Collections: QIN-LUNG-CT, SPIE-AAPM Lung CT Challenge Links: model weights, github lung-fdg-pet-ct.zip Segment Description: Lungs and FDG-avid lesions in the lung from FDG PET/CT scans IDC Collections: ACRIN-NSCLC-FDG-PET Anti-PD-1-Lung LUNG-PET-CT-Dx NSCLC Radiogenomics RIDER Lung PET-CT TCGA-LUAD TCGA-LUSC Links: model weights, github prostate-mr.zip Segment Description: Prostate from T2 MRI scans IDC Collection: ProstateX, Prostate-MRI-US-Biopsy Links: model weights, github Changelog 2.0.2 - Fix the brain-mr segmentations to be transformed correctly 2.0.1 - added AIMI 2024 radiologist comments to qa-results.csv 2.0.0 - added AIMI 2024 segmentations 1.X - AIMI 2023 segmentations and reviewer scores
研究の動機と目的
- 脳腫瘍画像分類と放射線ゲノム分類のための標準化され、高品質なベンチマークデータセットを確立すること。
- 臨床的に関連するアノテーションを備えた、多施設・マルチスキャナmpMRIデータ上でディーブラーニングモデルの評価を可能にすること。
- 臨床および研究現場における膠腫瘍分析のための汎用的で頑健なAIツールの開発を支援すること。
- 繰り返し専門家によるアノテーションとコンセンサス承認プロセスを導入することで、過去のデータセットに見られる限界を是正すること。
- 今後の拡張の基盤を築くこと。具体的には、膠腫瘍にとどまらず、手術後スキャンや広範な脳疾患を含む分野へと拡張すること。
提案手法
- 15カ国にまたがる59の機関からなる85名の専門家による多施設共同研究。
- ラベル品質を確保するため、単一のアノテーターと複数の承認者による反復的アノテーションプロセスを採用。
- セグメンテーションおよび分類モデルの安全で盲検評価を実施するため、Sage Bionetworks SynapseおよびKaggleプラットフォームを活用。
- 機関間での手法のばらつきを考慮し、MGMTプロモーターのメチル化状態を二値分類に標準化。
- 相互評価のばらつきを低減し、ラベルの信頼性を向上させるために、コンセンサスベースの精錬を適用。
- 白質高信号などの非膠腫瘍異常を除き、膠腫瘍に焦点を当てたデータキュレーションを実施。
実験結果
リサーチクエスチョン
- RQ1大規模かつ多施設の脳腫瘍画像分類データセットを、高い相互アノテーター信頼性を確保しながら体系的にキュレートする方法は何か?
- RQ2コンセンサスベースのアノテーションは、神経画像診断分野におけるディーブラーニングモデルの品質と一般化性能にどのような影響を与えるか?
- RQ3セグメンテーションと放射線ゲノムラベルを併せ持つ標準化されたベンチマークは、モデルの性能向上と臨床的関連性の向上に寄与できるか?
- RQ4データのサイロ化とプライバシー規制をどのように乗り越え、神経画像診断分野における大規模かつ多施設のデータ共有を実現できるか?
- RQ5現在のデータセットには、多様な病変や治療反応を十分に捉えていないという限界がある。今後のベンチマークは、これらの課題をどのように解決する方向に進化させるべきか?
主な発見
- BraTS 2021データセットには、専門家がアノテートした腫瘍亜領域(強化領域、非強化領域、壊死、浮腫)およびMGMTプロモーターのメチル化状態を有する2,000例のmpMRIスキャンが含まれる。
- 反復的精錬とコンセンサス承認により、高いラベル品質が達成されたが、相互評価の一致度は正式に測定されていない。
- Sage BionetworksおよびKaggleを通じて、データセットは公開されており、AIモデルの再現性のある評価を支援する。
- 上位成績を収めた参加者に合計60,000ドルの賞金が支払われ、高品質なモデル開発を促進した。
- 本データセットは、今後の拡張の基盤として機能する。具体的には、手術後スキャン、切除巣、多疾患セグメンテーションの対象に拡張する予定である。
- 著者らは、中央集権的アプローチからフェデレーテッドラーニングへの移行を提唱し、多施設研究におけるデータプライバシーとアクセス障壁を克服する。
より良い研究を、今すぐ始めましょう
論文設計から論文執筆まで、研究時間を劇的に削減しましょう。
クレジットカード登録不要
このレビューはAIが作成し、人間の編集者が確認しました。