QUICK REVIEW

[論文レビュー] Deep EHR: Chronic Disease Prediction Using Medical Notes

Jingshu Liu, Zachariah Zhang|arXiv (Cornell University)|Aug 15, 2018

Machine Learning in Healthcare参考文献 9被引用数 48

ひとこと要約

この論文は、未構造の医療ノートと構造化データを組み合わせて慢性疾患の発生を予測するマルチタスク深層学習フレームワークを紹介します。ノートが性能を向上させ、否定表現の扱いが精度を向上させます。

ABSTRACT

Early detection of preventable diseases is important for better disease management, improved inter-ventions, and more efficient health-care resource allocation. Various machine learning approacheshave been developed to utilize information in Electronic Health Record (EHR) for this task. Majorityof previous attempts, however, focus on structured fields and lose the vast amount of information inthe unstructured notes. In this work we propose a general multi-task framework for disease onsetprediction that combines both free-text medical notes and structured information. We compareperformance of different deep learning architectures including CNN, LSTM and hierarchical models.In contrast to traditional text-based prediction models, our approach does not require disease specificfeature engineering, and can handle negations and numerical values that exist in the text. Ourresults on a cohort of about 1 million patients show that models using text outperform modelsusing just structured data, and that models capable of using numerical values and negations in thetext, in addition to the raw text, further improve performance. Additionally, we compare differentvisualization methods for medical professionals to interpret model predictions.

研究の動機と目的

EHRに蓄積された豊富な情報を用いて、予防可能な疾病の早期検出を促進する。
自由テキストのノートと構造化された数値データを融合する汎用的なマルチタスクアーキテクチャを提案する。
大規模な実世界のEHRデータ上で、さまざまな深層学習アーキテクチャ（CNN、LSTM、階層モデルなど）を評価する。
否定表現への対応と臨床的解釈性を支援する可視化手法を提供する。

提案手法

ノートを語 embeddings（PubMed および社内 StarSpace 埋め込み）で表現し、CNN、LSTM、または階層モデルへ入力する。
ノートから抽出された数値的なラボ/バイタルサイン値と人口統計情報を追加入力として取り込む。
病気固有の出力とマスク済み損失を用いたマルチタスクフレームワークを使用し、患者-病気適格性の重複に対処する。
否定タグ付けステップ（NegEx）を適用して、否定所見の解釈を改善する。
ベースラインモデル（ラボ/人口統計とともにロジスティック回帰、TF-IDF テキスト）を深層学習モデルと比較する。
高影響の n-gram やフレーズを特定するためのログオッズベースの可視化手法を提供する。

実験結果

リサーチクエスチョン

RQ1未構造の医療ノートは、構造化データだけを用いる場合よりも病気発生予測を改善できるか。
RQ2どの深層学習アーキテクチャがノートを最も効果的に活用して複数の病気発生予測を行えるか。
RQ3ノート内の否定表現と数値値は予測性能にどのように影響するか。
RQ4臨床医がモデル予測を解釈するのに最も有効な可視化手法は何か。

主な発見

Model	Heart Failure AUC	Kidney Failure AUC	Stroke AUC
Log Reg Lab/Demo	0.781	0.724	0.70
LSTM Lab/Demo	0.813	0.743	0.699
Logistic Reg Notes	0.810	0.752	0.708
CNN PubMed Embeddings	0.844	0.799	0.711
CNN Single Task	0.847	0.796	0.706
CNN	0.854	0.802	0.714
CNN + Neg Tag	0.867	0.811	0.727
CNN + Neg Tag + Dense	0.880	0.812	0.733
CNN + Neg Tag + Dense + Lab/Demo	0.893	0.822	0.749
BiLSTM	0.869	0.807	0.738
BiLSTM + Neg Tag	0.875	0.811	0.745
BiLSTM + Neg Tag + Dense	0.892	0.823	0.739
BiLSTM + Neg Tag + Dense + Lab/Demo	0.900	0.833	0.753
Enc CNN-LSTM	0.859	0.797	0.727
Enc CNN-LSTM + Lab/Demo	0.885	0.812	0.740

医療ノートを用いたモデルは、ラボと人口統計データのみを用いたモデルを上回る。
ノートを含む深層学習モデルは、TF-IDF 特徴を用いたロジスティック回帰のベースラインよりAUCが高い。
否定タグ付けの組み込みとラボ/人口統計の併用により、CHF、KF、および脳卒中で最良の性能を達成する。
最良のモデル（NegTag付きのBiLSTM、密結合層、ラボ/人口統計入力を追加）は3つの疾患すべてで最も高いAUCを達成した。
ログオッズベースの可視化は、勾配ベースの方法より直感的な説明を提供する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。