QUICK REVIEW

[論文レビュー] The MAMA-MIA Challenge: Advancing Generalizability and Fairness in Breast MRI Tumor Segmentation and Treatment Response Prediction

Lidia Garrucho, Smriti Joshi|arXiv (Cornell University)|Mar 1, 2026

MRI in cancer diagnosis被引用数 0

ひとこと要約

この論文は MAMA-MIA チャレンジを紹介し、機関を跨ぐ公平性評価を含む大規模ベンチマークを用いて乳房 DCE-MRI 腫瘍分割と pCR 予測を評価します。最終リーダーボードと精度と公平性のトレードオフに関する洞察を報告します。

ABSTRACT

Breast cancer is the most frequently diagnosed malignancy among women worldwide and a leading cause of cancer-related mortality. Dynamic contrast-enhanced magnetic resonance imaging plays a central role in tumor characterization and treatment monitoring, particularly in patients receiving neoadjuvant chemotherapy. However, existing artificial intelligence models for breast magnetic resonance imaging are often developed using single-center data and evaluated using aggregate performance metrics, limiting their generalizability and obscuring potential performance disparities across demographic subgroups. The MAMA-MIA Challenge was designed to address these limitations by introducing a large-scale benchmark that jointly evaluates primary tumor segmentation and prediction of pathologic complete response using pre-treatment magnetic resonance imaging only. The training cohort comprised 1,506 patients from multiple institutions in the United States, while evaluation was conducted on an external test set of 574 patients from three independent European centers to assess cross-continental and cross-institutional generalization. A unified scoring framework combined predictive performance with subgroup consistency across age, menopausal status, and breast density. Twenty-six international teams participated in the final evaluation phase. Results demonstrate substantial performance variability under external testing and reveal trade-offs between overall accuracy and subgroup fairness. The challenge provides standardized datasets, evaluation protocols, and public resources to promote the development of robust and equitable artificial intelligence systems for breast cancer imaging.

研究の動機と目的

単一施設研究の一般化可能性の限界に対処することで、乳がん画像診断における堅牢で公正な AI を動機づける。
統一フレームワークを用いて、一次腫瘍の分割と治療前 pCR 予測を共同で評価する。
年齢、閉経状況、乳房密度サブグループ間でモデルの公平性を評価する。
再現性のある公平な AI を促進する標準化データセット、プロトコル、ベースラインリソースを提供する。

提案手法

二つのタスクベンチマークを定義する：Task 1 は自動的な一次腫瘍分割、Task 2 は前処置 MRI のみを用いた pCR 予測。
US の多施設コホート（n=1506）で訓練し、民間欧州センター（n=574）で検証してドメイン横断の一般化可能性を評価する。
精度と公平性を統合した統一スコアリング枠組みを使用する；λ = 0.5 で等重み。
年齢、閉経状況、乳房密度によって定義されるサブグループで公平性を評価する。
再現性のための標準化前処理とコンテナ化された評価ワークフローを CodaBench 上で提供する。
設計の傾向と精度–公平性のトレードオフを分析するために、多様なチーム（26 チーム、14 カ国）を比較する。

実験結果

リサーチクエスチョン

RQ1乳房 MRI 腫瘍分割と pCR 予測において、機関や大陸を越えたモデルの一般化はどの程度か。
RQ2人口統計学的要因（年齢、閉経状況、乳房密度）がモデルの性能と公平性に与える影響は何か。
RQ3先端的手法における予測精度とサブグループ公平性のトレードオフはどうなるか。
RQ4クロスサイト評価下で頑健で公正な性能を引き出すアーキテクチャとトレーニング戦略は何か。

主な発見

Rank	Team	Combined Score	Fairness Score	Performance Score	DSC	NormHD
1	MIC	0.8858	0.9531	0.8185	0.7360	0.0990
2	FME	0.8820	0.9574	0.8066	0.7125	0.0993
3	ViCOROB	0.8782	0.9482	0.8083	0.7182	0.1017
4	Martel Lab	0.8735	0.9449	0.8021	0.7121	0.1078
5	AIH-Mama	0.8677	0.9532	0.7823	0.6914	* 0.1268*
6	HWT@YCH	0.8655	0.9339	0.7971	0.7080	0.1138
7	Flamingo	0.8640	0.9434	0.7847	0.7033	* 0.1338*
8	CALADAN	0.8631	0.9621	0.7640	0.7022	* 0.1742*
9	bigAI	0.8517	0.9464	0.7570	0.6872	* 0.1732*
10	Shangqi,Gao@CAM	0.8485	0.9621	0.7349	0.6101	0.1404
11	GK_KI	0.8451	0.9581	0.7321	0.6330	0.1688
12	Jeff	0.8439	0.9519	0.7360	0.7025	* 0.2305*
13	Baseline	0.8290	0.9373	0.7208	0.6871	0.2455
14	Dynamo	0.8290	0.9373	0.7208	0.6871	* 0.2455*
15	PM	0.8290	0.9373	0.7208	0.6871	* 0.2455*
16	AEHRC-MIA	0.8256	0.9261	0.7251	0.6781	* 0.2280*
17	AI Strollers	0.8030	0.9156	0.6904	0.6296	* 0.2489*
18	MedImgLab_Unipa	0.7270	0.9084	0.5456	0.4717	0.3805
19	FPixel	0.7270	0.9084	0.5456	0.4717	0.3805
20	BWS-KNU	0.7257	0.9382	0.5132	0.4556	0.4291
21	CIG@Illinois	0.6593	0.8931	0.4256	0.5195	0.6683

Task 1 で 12 チームがベースラインを上回り、公平性と性能の両方で上位に広く貢献した。
Task 1 では、上位手法が DSC の大幅な向上と NormHD の低減を達成し、ベースラインと比べて改善。
Task 2 ではベースラインを超えたのは三つのチームのみだが、三つ全てが公平性を改善し、二つはベースライン以上の性能を示した。
大会は外部テスト時のパフォーマンスのばらつきと、全体精度とサブグループ公平性のトレードオフを顕著に明らかにした。
このベンチマークは、標準化されたデータセット、評価コード、レポーティングガイドラインを提供し、乳がん画像診断における堅牢で公平な AI の促進を目指す。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。