QUICK REVIEW

[论文解读] Pathologist-Level Grading of Prostate Biopsies with Artificial Intelligence

Peter Ström, Kimmo Kartasalo|arXiv (Cornell University)|Jul 2, 2019

Prostate Cancer Diagnosis and Treatment参考文献 33被引用 12

一句话总结

本研究开发了一种深度学习人工智能系统，利用STHLM3人群为基础研究中的全切片图像，在前列腺活检分级中实现了病理科医生水平的准确性。该系统在6,682份活检样本上进行训练，并在1,631份独立病例上进行测试，癌症检测的曲线下面积（AUC）达到0.997，患者层面癌症预测的AUC为0.999，Gleason分级的Cohen’s kappa值为0.62，与专家病理科医生水平相当，表明该系统具有显著降低前列腺癌病理诊断中变异性和工作负担的潜力。

ABSTRACT

Background: An increasing volume of prostate biopsies and a world-wide shortage of uro-pathologists puts a strain on pathology departments. Additionally, the high intra- and inter-observer variability in grading can result in over- and undertreatment of prostate cancer. Artificial intelligence (AI) methods may alleviate these problems by assisting pathologists to reduce workload and harmonize grading. Methods: We digitized 6,682 needle biopsies from 976 participants in the population based STHLM3 diagnostic study to train deep neural networks for assessing prostate biopsies. The networks were evaluated by predicting the presence, extent, and Gleason grade of malignant tissue for an independent test set comprising 1,631 biopsies from 245 men. We additionally evaluated grading performance on 87 biopsies individually graded by 23 experienced urological pathologists from the International Society of Urological Pathology. We assessed discriminatory performance by receiver operating characteristics (ROC) and tumor extent predictions by correlating predicted millimeter cancer length against measurements by the reporting pathologist. We quantified the concordance between grades assigned by the AI and the expert urological pathologists using Cohen's kappa. Results: The performance of the AI to detect and grade cancer in prostate needle biopsy samples was comparable to that of international experts in prostate pathology. The AI achieved an area under the ROC curve of 0.997 for distinguishing between benign and malignant biopsy cores, and 0.999 for distinguishing between men with or without prostate cancer. The correlation between millimeter cancer predicted by the AI and assigned by the reporting pathologist was 0.96. For assigning Gleason grades, the AI achieved an average pairwise kappa of 0.62. This was within the range of the corresponding values for the expert pathologists (0.60 to 0.73).

研究动机与目标

解决前列腺癌诊断中日益增长的工作负担和泌尿病理学家短缺问题。
降低前列腺活检Gleason分级中高程度的观察者内和观察者间变异。
开发一种能够以临床准确性检测、定位和分级前列腺癌的人工智能系统。
使用标准化指标将人工智能系统的表现与专家病理科医生进行对比。
证明人工智能在人群为基础的前列腺癌筛查中的临床可行性。

提出的方法

将STHLM3研究中的8,313份前列腺活检全切片图像数字化，其中6,682份用于训练，1,631份用于独立测试。
基于Inception V3、ResNet-50和Xception架构的集成模型训练深度神经网络（DNNs）。
通过在训练数据上进行交叉验证的超参数调优来优化模型性能。
采用ImageNet预训练的迁移学习方法，以提高在前列腺组织病理学上的泛化能力。
使用XGBoost回归模型，基于DNN特征预测毫米级癌症长度。
通过受试者工作特征（ROC）曲线、相关性分析以及分级一致性检验的Cohen’s kappa值对性能进行验证。

实验结果

研究问题

RQ1人工智能系统是否能够在活检样本中实现病理科医生水平的前列腺癌检测准确性？
RQ2人工智能系统的Gleason分级表现与专家泌尿病理学家相比如何？
RQ3人工智能在多大程度上能够降低前列腺癌分级中的观察者间变异？
RQ4人工智能预测的癌症长度（以毫米计）与病理科医生测量结果的吻合度如何？
RQ5人工智能是否能够在真实世界的人群为基础筛查环境中可靠应用？

主要发现

人工智能在区分良性与恶性活检核心时，曲线下面积（AUC）达到0.997。
在判断患者是否患有前列腺癌的分类任务中，AUC为0.999。
人工智能预测的癌症长度与病理科医生测量值之间的相关性为0.96。
人工智能在Gleason分级上的平均成对Cohen’s kappa值为0.62，处于专家病理科医生的范围（0.60–0.73）内。
人工智能在多种组织学亚型及复杂病例（包括不典型增生和前列腺上皮内瘤变）中表现出稳健性能。
该模型在不同活检核心和医疗机构间均保持高性能，表明其具有强大的泛化能力。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。