[论文解读] Medical Concept Representation Learning from Electronic Health Records and its Application on Heart Failure Prediction
论文通过利用时间共现从纵向EHR数据学习概念表示,并展示使用这些表示改进心衰预测。
Objective: To transform heterogeneous clinical data from electronic health records into clinically meaningful constructed features using data driven method that rely, in part, on temporal relations among data. Materials and Methods: The clinically meaningful representations of medical concepts and patients are the key for health analytic applications. Most of existing approaches directly construct features mapped to raw data (e.g., ICD or CPT codes), or utilize some ontology mapping such as SNOMED codes. However, none of the existing approaches leverage EHR data directly for learning such concept representation. We propose a new way to represent heterogeneous medical concepts (e.g., diagnoses, medications and procedures) based on co-occurrence patterns in longitudinal electronic health records. The intuition behind the method is to map medical concepts that are co-occuring closely in time to similar concept vectors so that their distance will be small. We also derive a simple method to construct patient vectors from the related medical concept vectors. Results: For qualitative evaluation, we study similar medical concepts across diagnosis, medication and procedure. In quantitative evaluation, our proposed representation significantly improves the predictive modeling performance for onset of heart failure (HF), where classification methods (e.g. logistic regression, neural network, support vector machine and K-nearest neighbors) achieve up to 23% improvement in area under the ROC curve (AUC) using this proposed representation. Conclusion: We proposed an effective method for patient and medical concept representation learning. The resulting representation can map relevant concepts together and also improves predictive modeling performance.
研究动机与目标
- 将异构的EHR数据转换为临床上有意义的、数据驱动的特征的动机。
- 提出一种基于时间共现的诊断、药物和程序的表示学习方法。
- 从概念向量构建患者向量,以实现下游预测任务。
提出的方法
- 通过纵向EHR数据中的共现模式来表示医学概念(诊断、药物、程序)。
- 将时间上接近地共现的概念映射到相似的向量,以反映临床相关性。
- 推导简单方法从概念向量组成患者向量用于预测。
- 通过观察跨类别的概念相似性,从定性上评估表示质量。
- 使用标准分类器(逻辑回归、神经网络、SVM、k-NN)定量评估心衰发作的预测性能。
实验结果
研究问题
- RQ1基于共现的概念表示是否能捕捉诊断、药物和程序之间临床上有意义的关系?
- RQ2与原始编码特征相比,学习得到的表示是否能改进心衰发作预测模型?
- RQ3在使用所提出的表示进行HF预测时,不同分类器的表现如何?
- RQ4将医学概念在向量空间中表示时,会出现哪些定性关系?
主要发现
- 定性分析显示在诊断、药物和程序类别中,临床相似的概念聚集在一起。
- 定量结果显示在使用所提出的表示时,HF发作预测显著提升,相较基线在AUC上最高提升可达23%。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。