QUICK REVIEW

[論文レビュー] Reading Race: AI Recognises Patient's Racial Identity In Medical Images

Imon Banerjee, Ananth Reddy Bhimireddy|arXiv (Cornell University)|Jul 21, 2021

Artificial Intelligence in Healthcare and Education参考文献 13被引用数 31

ひとこと要約

本論文は、深層学習モデルが医用画像の複数のモダリティに渡って患者の自己申告的人種を予測できることを、外部検証とともに示し、放射線診断領域での適用リスクを高める。

ABSTRACT

Background: In medical imaging, prior studies have demonstrated disparate AI performance by race, yet there is no known correlation for race on medical imaging that would be obvious to the human expert interpreting the images. Methods: Using private and public datasets we evaluate: A) performance quantification of deep learning models to detect race from medical images, including the ability of these models to generalize to external environments and across multiple imaging modalities, B) assessment of possible confounding anatomic and phenotype population features, such as disease distribution and body habitus as predictors of race, and C) investigation into the underlying mechanism by which AI models can recognize race. Findings: Standard deep learning models can be trained to predict race from medical images with high performance across multiple imaging modalities. Our findings hold under external validation conditions, as well as when models are optimized to perform clinically motivated tasks. We demonstrate this detection is not due to trivial proxies or imaging-related surrogate covariates for race, such as underlying disease distribution. Finally, we show that performance persists over all anatomical regions and frequency spectrum of the images suggesting that mitigation efforts will be challenging and demand further study. Interpretation: We emphasize that model ability to predict self-reported race is itself not the issue of importance. However, our findings that AI can trivially predict self-reported race -- even from corrupted, cropped, and noised medical images -- in a setting where clinical experts cannot, creates an enormous risk for all model deployments in medical imaging: if an AI model secretly used its knowledge of self-reported race to misclassify all Black patients, radiologists would not be able to tell using the same data the model has access to.

研究の動機と目的

医用画像におけるAIの性能における人種バイアスの検討を促す。
深層学習モデルが医用画像から人種を検出する能力を定量化する。
外部データセットおよび画像モダリティ間での一般化を評価する。
人種検出が蔓延する混乱要因となる解剖学的または表現型的特徴に依存しているかを調べる。
AIモデルが人種を認識する根底にある機序を探る。

提案手法

私的および公開の医用画像データセット上で標準的な深層学習モデルを訓練し、人種を予測する。
複数の画像モダリティと外部データセットで性能を評価する。
病気分布や体格といった混乱因子を人種の予測因子として検証する。
臨床的動機付けタスクの最適化時のモデル性能を評価する。
AIモデルが人種を認識できる機序を調査する（単純な代理指標によるものではない）。
解剖学的領域と周波数スペクトル全体で、破損・切り抜き・ノイズのある画像に対する頑健性を評価する。

実験結果

リサーチクエスチョン

RQ1深層学習モデルは複数のモダリティにわたり、患者の自己申告的人種を医用画像から正確に予測できるか。
RQ2モデルは学習データを超えた外部環境に一般化できるか。
RQ3モデルの予測は、人種自体よりも混乱因子となる解剖学的または表現型的特徴によって駆動されているか。
RQ4画像の代理変数を超えて、医用画像からAIが人種を認識する仕組みとは何か。
RQ5解剖学的領域と画像周波数全体にわたり、人種検出能力はどれだけ持続するか。

主な発見

標準的な深層学習モデルは、複数の画像モダリティにおいて高い性能で医用画像から人種を予測するよう訓練できる。
外部検証条件下で、また臨床的動機づけタスク向けに最適化されたときにもこの所見は成立する。
検出は、基礎疾患分布などの単純な代理指標や画像関連の代理共変量によるものではない。
性能はすべての解剖学的領域と画像の周波数スペクトル全体で持続しており、緩和は難しいことを示唆している。
もしモデルが隠れて人種情報を使用している場合、放射線科医は同じデータからそれを検出できない。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。