QUICK REVIEW

[论文解读] Development and Validation of Deep Learning Algorithms for Detection of Critical Findings in Head CT Scans

Sasank Chilamkurthy, Rohit Ghosh|arXiv (Cornell University)|Mar 13, 2018

Medical Imaging Techniques and Applications参考文献 38被引用 82

一句话总结

该论文开发并验证深度学习模型，以自动检测无对比头部CT中的关键发现——ICH（及亚型）、骨折、中线偏移和占位效应——使用大规模多中心数据集(Qure25k和CQ500)并报告AUC和运行点性能。

ABSTRACT

Importance: Non-contrast head CT scan is the current standard for initial imaging of patients with head trauma or stroke symptoms. Objective: To develop and validate a set of deep learning algorithms for automated detection of following key findings from non-contrast head CT scans: intracranial hemorrhage (ICH) and its types, intraparenchymal (IPH), intraventricular (IVH), subdural (SDH), extradural (EDH) and subarachnoid (SAH) hemorrhages, calvarial fractures, midline shift and mass effect. Design and Settings: We retrospectively collected a dataset containing 313,318 head CT scans along with their clinical reports from various centers. A part of this dataset (Qure25k dataset) was used to validate and the rest to develop algorithms. Additionally, a dataset (CQ500 dataset) was collected from different centers in two batches B1 & B2 to clinically validate the algorithms. Main Outcomes and Measures: Original clinical radiology report and consensus of three independent radiologists were considered as gold standard for Qure25k and CQ500 datasets respectively. Area under receiver operating characteristics curve (AUC) for each finding was primarily used to evaluate the algorithms. Results: Qure25k dataset contained 21,095 scans (mean age 43.31; 42.87% female) while batches B1 and B2 of CQ500 dataset consisted of 214 (mean age 43.40; 43.92% female) and 277 (mean age 51.70; 30.31% female) scans respectively. On Qure25k dataset, the algorithms achieved AUCs of 0.9194, 0.8977, 0.9559, 0.9161, 0.9288 and 0.9044 for detecting ICH, IPH, IVH, SDH, EDH and SAH respectively. AUCs for the same on CQ500 dataset were 0.9419, 0.9544, 0.9310, 0.9521, 0.9731 and 0.9574 respectively. For detecting calvarial fractures, midline shift and mass effect, AUCs on Qure25k dataset were 0.9244, 0.9276 and 0.8583 respectively, while AUCs on CQ500 dataset were 0.9624, 0.9697 and 0.9216 respectively.

研究动机与目标

自动化分诊和快速识别紧急头部CT发现的动机，以减少治疗延误。
开发具有放射学报告和专家共识作为金标准的大规模、覆盖多中心的数据集（Qure25k 和 CQ500）。
为颅内出血、骨折和占位效应/中线偏移分别训练独立的深度学习模型。
提供每种发现的性能指标，以支持临床部署和基准比较。

提出的方法

使用 ResNet18 训练切片级出血分类器，为每种出血类型设置五个并行全连接层，并将切片置信度与随机森林结合以进行扫描级预测。
训练密集分割模型（UNet）用于 IPH、SDH、EDH，并对颅骨骨折检测使用基于 DeepLab 的方法，利用硬负挖以解决稀疏性。
对中线偏移和占位效应使用两分支方法，采用修改后的 ResNet18 和并行全连接层，通过随机森林聚合以获得扫描级置信度。
通过选择 axial 非对比序列、重新采样到 5 mm、调整大小到 224x224，并将脑部、骨骼和硬膜下窗口堆叠为通道来预处理 CT 扫描。
使用 ROC 曲线评估，AUC 作为主要指标，并在高灵敏度和高特异性运行点报告灵敏度和特异性。

实验结果

研究问题

RQ1深度学习模型是否能在跨多中心的非对比头部CT上准确检测五种颅内出血类型？
RQ2模型是否能可靠检测颅骨骨折、中线偏移和占位效应，以及它们与放射科医师共识相比如何？
RQ3模型在开发数据集(Qure25k)与独立临床验证数据集(CQ500)之间的泛化表现如何？
RQ4使用多数放射科医师共识与单一读者金标准对测量性能有何影响？
RQ5在繁忙或偏远环境中，自动分诊系统是否能通过提供可靠的扫描分诊来缩短治疗时间？

主要发现

在 Qure25k 上，ICH 的 AUC 为 0.9194，脑室内出血为 0.9544，中线偏移为 0.9276，颅骨骨折为 0.9244，mass effect 为 0.8583。
在 CQ500（B1+B2）上，ICH 的 AUC 为 0.9419，IPH 0.9544，IVH 0.9310，SDH 0.9521，EDH 0.9731，SAH 0.9574，颅骨骨折 0.9624，中线偏移 0.9697，mass effect 0.9216。
在 CQ500 的高灵敏度运行点上，灵敏度为 0.9463（ICH）、0.9487（颅骨骨折）、0.9385（中线偏移），特异性分别为 0.7098、0.8606、0.8944。
与 Qure25k 相比，CQ500 的 AUC 更高，mass effect 的差异最大（0.9216 vs 0.8583）。
CQ500 在 ICH（Fleiss’ κ = 0.7827）和 IPH（0.7746）的读者一致性更高，而在颅骨骨折（0.4507）和 SDH（0.5418）则较低。
该研究公开提供 CQ500 数据集用于基准测试，并展示了对头部 CT 的每种发现的深度学习性能。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。