[論文レビュー] Deep Neural Networks Improve Radiologists' Performance in Breast Cancer Screening
二段階ディープCNNは乳房レベルとピクセルレベルのラベルを用いてスクリーニングマンモグラムで放射線科医と同等の性能を発揮し、セカンドリーダーとして使用した場合放射線科医の性能を向上させる。ハイブリッドモデルはそれぞれ単独よりも上回る。
We present a deep convolutional neural network for breast cancer screening exam classification, trained and evaluated on over 200,000 exams (over 1,000,000 images). Our network achieves an AUC of 0.895 in predicting whether there is a cancer in the breast, when tested on the screening population. We attribute the high accuracy of our model to a two-stage training procedure, which allows us to use a very high-capacity patch-level network to learn from pixel-level labels alongside a network learning from macroscopic breast-level labels. To validate our model, we conducted a reader study with 14 readers, each reading 720 screening mammogram exams, and find our model to be as accurate as experienced radiologists when presented with the same data. Finally, we show that a hybrid model, averaging probability of malignancy predicted by a radiologist with a prediction of our neural network, is more accurate than either of the two separately. To better understand our results, we conduct a thorough analysis of our network's performance on different subpopulations of the screening population, model design, training procedure, errors, and properties of its internal representations.
研究の動機と目的
- Motivate improved breast cancer screening accuracy while reducing false positives.
- Exploit large-scale pixel-level and breast-level labels to train high-capacity networks.
- Develop a two-stage training framework to leverage patch-level heatmaps with a breast-level classifier.
- Evaluate model performance against radiologists and in radiologist–model hybrids.
提案手法
- Use four view-specific ResNet-22 based columns for CC and MLO views.
- Train an auxiliary patch-level network on 256x256 patches with malignant/benign labels from pixel-level segmentations.
- Generate heatmaps from patch-level predictions and feed them as extra channels to the breast-level model.
- Apply two-stage training to learn from pixel-level as well as breast-level labels (not end-to-end).
- Ensemble five models with different initializations to improve robustness.
実験結果
リサーチクエスチョン
- RQ1Can a deep CNN trained with breast-level and pixel-level labels achieve radiologist-level accuracy on screening mammograms?
- RQ2Does adding patch-level heatmaps improve malignant/benign predictions compared to image-only models?
- RQ3Do hybrids of radiologists and the CNN outperform either alone?
- RQ4How does model performance vary across populations (screening vs biopsied) and patient subgroups (age, breast density)?
主な発見
| Population | Model | Malignant AUC (single) | Benign AUC (single) | Malignant AUC (ensemble) | Benign AUC (ensemble) |
|---|---|---|---|---|---|
| Screening population | image-only | 0.827 ± 0.008 | 0.731 ± 0.004 | 0.840 | 0.743 |
| Screening population | image-and-heatmaps | 0.886 ± 0.003 | 0.747 ± 0.002 | 0.895 | 0.756 |
| Biopsied subpopulation | image-only | 0.781 ± 0.006 | 0.673 ± 0.003 | 0.791 | 0.682 |
| Biopsied subpopulation | image-and-heatmaps | 0.843 ± 0.004 | 0.690 ± 0.002 | 0.850 | 0.696 |
- On the screening population, image-only AUCs were 0.827 (malignant) and 0.731 (benign) for single models; ensemble improved to 0.840 and 0.743.
- Image-and-heatmaps models achieved 0.886 (malignant) and 0.747 (benign) as single; 0.895 and 0.756 with ensemble.
- On the biopsied subpopulation, image-only single AUCs were 0.781 (malignant) and 0.673 (benign); image-and-heatmaps reached 0.843 (malignant) and 0.690 (benign) as single, 0.850 and 0.696 with ensemble.
- Radiologist readers showed AUCs spanning 0.705–0.860 (mean 0.778) with PRAUC 0.244–0.453 (mean 0.364).
- A hybrid model (radiologist and CNN averaged predictions) achieved higher AUC/PRAUC than either alone (e.g., average hybrid AUC 0.891, PRAUC 0.431).
より良い研究を、今すぐ始めましょう
論文設計から論文執筆まで、研究時間を劇的に削減しましょう。
クレジットカード登録不要
このレビューはAIが作成し、人間の編集者が確認しました。