QUICK REVIEW

[論文レビュー] Development and Validation of Deep Learning Algorithms for Detection of Critical Findings in Head CT Scans

Sasank Chilamkurthy, Rohit Ghosh|arXiv (Cornell University)|Mar 13, 2018

Medical Imaging Techniques and Applications参考文献 38被引用数 82

ひとこと要約

この論文は非造影頭部CTでの重大所見を自動検出する深層学習モデルを開発・検証し、ICH（およびサブタイプ）、骨折、正中線偏位、腫瘍効果を検出する。大規模な多施設データセット（Qure25kおよびCQ500）を用い、AUCと動作点性能を報告する。

ABSTRACT

Importance: Non-contrast head CT scan is the current standard for initial imaging of patients with head trauma or stroke symptoms. Objective: To develop and validate a set of deep learning algorithms for automated detection of following key findings from non-contrast head CT scans: intracranial hemorrhage (ICH) and its types, intraparenchymal (IPH), intraventricular (IVH), subdural (SDH), extradural (EDH) and subarachnoid (SAH) hemorrhages, calvarial fractures, midline shift and mass effect. Design and Settings: We retrospectively collected a dataset containing 313,318 head CT scans along with their clinical reports from various centers. A part of this dataset (Qure25k dataset) was used to validate and the rest to develop algorithms. Additionally, a dataset (CQ500 dataset) was collected from different centers in two batches B1 & B2 to clinically validate the algorithms. Main Outcomes and Measures: Original clinical radiology report and consensus of three independent radiologists were considered as gold standard for Qure25k and CQ500 datasets respectively. Area under receiver operating characteristics curve (AUC) for each finding was primarily used to evaluate the algorithms. Results: Qure25k dataset contained 21,095 scans (mean age 43.31; 42.87% female) while batches B1 and B2 of CQ500 dataset consisted of 214 (mean age 43.40; 43.92% female) and 277 (mean age 51.70; 30.31% female) scans respectively. On Qure25k dataset, the algorithms achieved AUCs of 0.9194, 0.8977, 0.9559, 0.9161, 0.9288 and 0.9044 for detecting ICH, IPH, IVH, SDH, EDH and SAH respectively. AUCs for the same on CQ500 dataset were 0.9419, 0.9544, 0.9310, 0.9521, 0.9731 and 0.9574 respectively. For detecting calvarial fractures, midline shift and mass effect, AUCs on Qure25k dataset were 0.9244, 0.9276 and 0.8583 respectively, while AUCs on CQ500 dataset were 0.9624, 0.9697 and 0.9216 respectively.

研究の動機と目的

迅速な治療遅延を減らすための自動トリアージと緊急頭部CT所見の迅速な同定を動機づける。
画像診断レポートと専門家のコンセンサスを金標準とした、中心分散型の大規模データセット（Qure25kとCQ500）を開発する。
颅内出血、骨折、腫瘍効果/正中線偏位それぞれに対して別個の深層学習モデルを訓練する。
臨床適用とベンチマーキングを支援するため、各所見ごとの性能指標を提供する。

提案手法

各出血タイプごとに5つの並列全結合層を持つResNet18を用いてスライスレベルの出血分類器を訓練し、スライス信頼度をランダムフォレストで結合してスキャンレベル予測とする。
IPH、SDH、EDHの密なセグメンテーションモデル（UNet）を訓練し、頭蓋骨骨折検出にはHard Negative Miningを用いたDeepLabベースのアプローチを採用して希少性に対処する。
正中線偏位と腫瘍効果には、修正されたResNet18と並列FC層を用いた二枝アプローチを採用し、ランダムフォレストで集約してスキャンレベルの信頼度を得る。
非造影の軸位系列を抽出し、5 mmへリサンプリング、224×224へリサイズ、脳・骨・硬膜下窓をチャネルとして積み重ねて前処理する。
ROC曲線を用いてAUCを主要指標とし、高感度および高特異度の動作点における感度と特異度を報告する。

実験結果

リサーチクエスチョン

RQ1多様なセンターにまたがる非造影頭部CTで、深層学習モデルは5タイプの颅内出血を正確に検出できるか。
RQ2頭蓋骨骨折、正中線偏位、腫瘍効果を信頼性高く検出できるか、放射線科医のコンセンサスとどう比較されるか。
RQ3開発データセット（Qure25k）と独立臨床検証データセット（CQ500）間での性能の一般化はどうか。
RQ4多数の放射線科医コンセンサスと単一読影者のゴールドスタンダードを用いることが測定性能に与える影響は何か。
RQ5忙しいまたは遠隔の環境で、信頼できるスキャントリアージを提供する自動トリアージシステムが治療開始までの時間を短縮できるか。

主な発見

Qure25kでは、ICHのAUCが0.9194、室周囲出血が0.9544、正中線偏位が0.9276、頭蓋骨骨折が0.9244、腫瘍効果が0.8583。
CQ500 (B1+B2)では、ICH0.9419、IPH0.9544、IVH0.9310、SDH0.9521、EDH0.9731、SAH0.9574、Calvarial fracture0.9624、Midline shift0.9697、Mass effect0.9216。
CQ500の高感度動作点では、感度0.9463(ICh)、0.9487(頭蓋骨骨折)、0.9385(正中線偏位)、特異度0.7098, 0.8606, 0.8944。
アルゴリズムはCQ500でQure25kより高いAUCを達成し、腫瘍効果が最大の格差を示した（0.9216 vs 0.8583）。
CQ500はICH (Fleiss’ kappa 0.7827)およびIPH (0.7746)の読影者一致が、頭蓋骨骨折(0.4507)とSDH(0.5418)より高かった。
本研究はベンチマーク用の公開CQ500データセットを提供し、頭部CTにおける各所見の深層学習性能を示している。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。