QUICK REVIEW

[論文レビュー] Pathologist-Level Grading of Prostate Biopsies with Artificial Intelligence

Peter Ström, Kimmo Kartasalo|arXiv (Cornell University)|Jul 2, 2019

Prostate Cancer Diagnosis and Treatment参考文献 33被引用数 12

ひとこと要約

本研究では、Sthlm3疫学的疫学的研究から得られた全スライド画像を用いて、病理医水準の正確性で前立腺生検をグレーディングできる深層学習AIシステムを開発した。6,682例の生検を用いたトレーニングと1,631例の独立したテストケースを用いた評価において、AIはがん検出のAUCが0.997、患者レベルのがん予測のAUCが0.999、GleasonグレーディングのCohenのKappa係数が0.62を達成し、熟練病理医と同等の水準を示した。これは、前立腺がん病理診断における診断のばらつきと作業負荷を低減する強力な可能性を示している。

ABSTRACT

Background: An increasing volume of prostate biopsies and a world-wide shortage of uro-pathologists puts a strain on pathology departments. Additionally, the high intra- and inter-observer variability in grading can result in over- and undertreatment of prostate cancer. Artificial intelligence (AI) methods may alleviate these problems by assisting pathologists to reduce workload and harmonize grading. Methods: We digitized 6,682 needle biopsies from 976 participants in the population based STHLM3 diagnostic study to train deep neural networks for assessing prostate biopsies. The networks were evaluated by predicting the presence, extent, and Gleason grade of malignant tissue for an independent test set comprising 1,631 biopsies from 245 men. We additionally evaluated grading performance on 87 biopsies individually graded by 23 experienced urological pathologists from the International Society of Urological Pathology. We assessed discriminatory performance by receiver operating characteristics (ROC) and tumor extent predictions by correlating predicted millimeter cancer length against measurements by the reporting pathologist. We quantified the concordance between grades assigned by the AI and the expert urological pathologists using Cohen's kappa. Results: The performance of the AI to detect and grade cancer in prostate needle biopsy samples was comparable to that of international experts in prostate pathology. The AI achieved an area under the ROC curve of 0.997 for distinguishing between benign and malignant biopsy cores, and 0.999 for distinguishing between men with or without prostate cancer. The correlation between millimeter cancer predicted by the AI and assigned by the reporting pathologist was 0.96. For assigning Gleason grades, the AI achieved an average pairwise kappa of 0.62. This was within the range of the corresponding values for the expert pathologists (0.60 to 0.73).

研究の動機と目的

前立腺がん診断における増加する作業負荷と泌尿器科病理医の不足に対処する。
前立腺生検のGleasonグレーディングにおける高い観察者内・観察者間ばらつきを低減する。
臨床的正確性を有するがんの検出・局在化・グレーディングが可能なAIシステムを開発する。
標準化された指標を用いてAIの性能を熟練病理医と比較評価する。
AIの人口ベースの前立腺がんスクリーニングへの臨床的妥当性を示す。

提案手法

Sthlm3研究から得られた8,313例の前立腺生検全スライド画像をデジタル化し、そのうち6,682例をトレーニング用、1,631例を独立したテスト用に分類した。
Inception V3、ResNet-50、Xceptionアーキテクチャを基にしたアンサンブルモデルを用いて、深層ニューラルネットワーク（DNN）をトレーニングした。
トレーニングデータにおける交差検証を用いたハイパーパramータチューニングにより、モデルのパフォーマンスを最適化した。
画像Netの事前学習を用いたトランスファーラーニングにより、前立腺組織像の一般化性能を向上させた。
DNN特徴量からがん長径（mm単位）を予測するためにXGBoost回帰を採用した。
受信者操作特性（ROC）曲線、相関分析、グレーディングの一貫性評価にCohenのKappa係数を用いて性能を検証した。

実験結果

リサーチクエスチョン

RQ1AIシステムは、生検標本における前立腺がんの検出において、病理医水準の正確性を達成できるか？
RQ2AIのGleasonグレーディング性能は、熟練泌尿器科病理医のものと比べてどうか？
RQ3AIは、前立腺がんグレーディングにおける観察者間ばらつきをどの程度低減できるか？
RQ4AIは、病理医による測定と比較して、がんの広がり（mm単位）をどの程度正確に予測できるか？
RQ5AIは、現実の人口ベースのスクリーニング環境において信頼性高く応用可能か？

主な発見

良性と悪性の生検コアを区別するための受信者操作特性曲線下積分（AUC）は0.997を達成した。
患者が前立腺がんを有するか否かを分類するAUCは0.999であった。
AIが予測したがん長径と病理医が割り当てた測定値との相関係数は0.96であった。
AIのGleasonグレーディングにおける平均ペアワイズCohenのKappa係数は0.62であり、熟練病理医の範囲（0.60–0.73）内に収まった。
AIは、アテイピアや前立腺上皮内新生腫瘍を含む多様な組織像変異および複雑な症例においても、強固なパフォーマンスを示した。
異なる生検コアおよび病院間でも高いパフォーマンスを維持しており、汎用性が非常に高いことが示された。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。