QUICK REVIEW

[論文レビュー] Medical Concept Representation Learning from Electronic Health Records and its Application on Heart Failure Prediction

Edward Choi, Andy Schuetz|arXiv (Cornell University)|Feb 11, 2016

Machine Learning in Healthcare参考文献 27被引用数 117

ひとこと要約

本研究は、時間的共起を活用して縦断的EHRデータから概念表現を学習し、それらの表現を用いた心不全予測の改善を実証する。

ABSTRACT

Objective: To transform heterogeneous clinical data from electronic health records into clinically meaningful constructed features using data driven method that rely, in part, on temporal relations among data. Materials and Methods: The clinically meaningful representations of medical concepts and patients are the key for health analytic applications. Most of existing approaches directly construct features mapped to raw data (e.g., ICD or CPT codes), or utilize some ontology mapping such as SNOMED codes. However, none of the existing approaches leverage EHR data directly for learning such concept representation. We propose a new way to represent heterogeneous medical concepts (e.g., diagnoses, medications and procedures) based on co-occurrence patterns in longitudinal electronic health records. The intuition behind the method is to map medical concepts that are co-occuring closely in time to similar concept vectors so that their distance will be small. We also derive a simple method to construct patient vectors from the related medical concept vectors. Results: For qualitative evaluation, we study similar medical concepts across diagnosis, medication and procedure. In quantitative evaluation, our proposed representation significantly improves the predictive modeling performance for onset of heart failure (HF), where classification methods (e.g. logistic regression, neural network, support vector machine and K-nearest neighbors) achieve up to 23% improvement in area under the ROC curve (AUC) using this proposed representation. Conclusion: We proposed an effective method for patient and medical concept representation learning. The resulting representation can map relevant concepts together and also improves predictive modeling performance.

研究の動機と目的

種々の異質なEHRデータを臨床的に意味のあるデータ駆動型特徴量へ変換する動機づけ。
時系列的共起に基づく診断、薬剤、手技のための表現学習アプローチを提案する。
概念ベクトルから患者ベクトルを構築し、下流の予測タスクを可能にする。

提案手法

縦断的EHRデータの共起パターンを用いて医療概念（診断、薬剤、手技）を表現する。
時間的に近接して共起する概念を類似ベクトルへマップし、臨床的関連性を反映させる。
予測のための概念ベクトルから患者ベクトルを構成する簡易的手法を導出する。
カテゴリ間の概念類似性を検討して、表現品質を質的に評価する。
標準的な分類器（ロジスティック回帰、ニューラルネットワーク、SVM、k-NN）を用いて心不全発症の予測性能を定量的に評価する。

実験結果

リサーチクエスチョン

RQ1共起ベースの概念表現は診断、薬剤、手技間の臨床的に意味のある関係を捕捉できるか？
RQ2学習された表現は、生のコード化特徴量と比較して心不全発症の予測モデルを改善するか？
RQ3提案された表現をHF予測に使用した場合、異なる分類器の性能はどうなるか？
RQ4ベクトル空間で表現したとき、医療概念間にどのような質的関係が現れるか？

主な発見

質的分析により、診断、薬剤、手技カテゴリを横断して臨床的に類似する概念が互いに近接してクラスター化することが示された。
定量的結果は、提案された表現を用いると心不全発症予測が有意に改善され、ベースラインと比較して最大で23%のAUC改善を示した。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。