QUICK REVIEW

[论文解读] REALM: RAG-Driven Enhancement of Multimodal Electronic Health Records Analysis via Large Language Models

Yinghao Zhu, Changyu Ren|arXiv (Cornell University)|Feb 10, 2024

Topic Modeling被引用 11

一句话总结

REALM 使用检索增强生成框架将长上下文的临床笔记、时间序列 EHR 数据与专业医学知识图谱融合，以改善多模态 EHR 预测，降低幻觉并提升临床任务表现。

ABSTRACT

The integration of multimodal Electronic Health Records (EHR) data has significantly improved clinical predictive capabilities. Leveraging clinical notes and multivariate time-series EHR, existing models often lack the medical context relevent to clinical tasks, prompting the incorporation of external knowledge, particularly from the knowledge graph (KG). Previous approaches with KG knowledge have primarily focused on structured knowledge extraction, neglecting unstructured data modalities and semantic high dimensional medical knowledge. In response, we propose REALM, a Retrieval-Augmented Generation (RAG) driven framework to enhance multimodal EHR representations that address these limitations. Firstly, we apply Large Language Model (LLM) to encode long context clinical notes and GRU model to encode time-series EHR data. Secondly, we prompt LLM to extract task-relevant medical entities and match entities in professionally labeled external knowledge graph (PrimeKG) with corresponding medical knowledge. By matching and aligning with clinical standards, our framework eliminates hallucinations and ensures consistency. Lastly, we propose an adaptive multimodal fusion network to integrate extracted knowledge with multimodal EHR data. Our extensive experiments on MIMIC-III mortality and readmission tasks showcase the superior performance of our REALM framework over baselines, emphasizing the effectiveness of each module. REALM framework contributes to refining the use of multimodal EHR data in healthcare and bridging the gap with nuanced medical context essential for informed clinical predictions.

研究动机与目标

通过将外部医学知识与多模态 EHR 数据融合，推动改善临床预测。
提出一个 RAG 驱动的框架，使用笔记和时间序列中的实体提取与对准到专业 KG 以降低幻觉。
开发一种自适应多模态融合网络，将基于知识的表征融入下游任务。

提出的方法

用 GRU 编码时间序列 EHR 以获得 hTS。
用长上下文 LLM 编码临床笔记以获得 hText。
使用 LLM 提示与基于规则的验证从笔记和时间序列提取疾病实体。
使用稠密向量检索与阈值 η 将提取的实体与 PrimeKG 节点匹配。
用 LLM 编码检索的知识以获得 hRAG。
使用自注意力/互注意力融合网络将 hTS、hText、hRAG 融合，生成 z 并预测 y。

实验结果

研究问题

RQ1RAG 驱动框架是否能够有效将非结构化数据（临床笔记）与结构化数据（时间序列）及外部医学知识融合于临床预测任务？
RQ2实体提取与 KG 匹配是否能降低 LLM 幻觉并提高 EHR 分析的预测可靠性？
RQ3自适应多模态融合与现代文本嵌入对死亡率与再入院任务有何影响？
RQ4REALM 对临床数据稀疏性的鲁棒性如何？

主要发现

Methods	Mortality AUROC	Mortality AUPRC	Mortality min(+P, Se)	Mortality F1	Readmission AUROC	Readmission AUPRC	Readmission min(+P, Se)	Readmission F1
MPIM	85.24±1.12	50.52±2.56	50.59±2.33	30.53±2.33	78.62±1.58	49.30±3.01	49.65±2.54	26.61±2.20
UMM	84.01±1.10	49.76±2.21	49.41±2.45	36.21±1.90	77.46±1.36	47.81±2.55	47.27±1.91	34.14±2.21
VecoCare	83.43±1.49	47.28±2.68	47.92±2.22	42.52±2.08	76.93±1.82	46.18±2.76	47.22±2.63	38.79±2.27
M3Care	83.33±1.24	47.86±2.33	49.96±1.99	24.81±2.62	76.80±1.55	46.29±2.62	45.38±2.32	21.51±2.23
GRAM	84.70±1.34	49.21±4.45	49.64±2.85	38.02±3.19	77.84±1.49	47.97±3.68	46.95±2.12	35.24±2.89
KAME	84.59±1.11	49.48±3.37	49.51±2.33	36.14±2.24	78.04±1.34	48.23±3.21	47.41±2.50	31.70±2.19
CGL	84.20±1.16	47.64±3.47	47.67±2.61	38.36±2.04	77.47±1.33	46.68±3.33	47.73±2.25	35.34±2.35
KerPrint	85.29±1.21	51.23±3.48	50.88±2.24	37.00±3.54	78.41±1.50	49.70±3.23	49.39±2.53	34.31±2.35
Ours (REALM)	86.22±0.81	52.64±2.47	50.92±2.01	51.83±2.10	80.24±1.53	52.06±2.64	51.20±2.50	50.58±2.51
Ours	85.18±0.95	50.68±2.64	47.90±2.27	49.81±2.37	78.79±1.47	49.69±2.92	51.20±2.50	50.58±2.51

REALM 在 MIMIC-III 上相对于基线的死亡率与再入院预测性能有所提升（AUROC、AUPRC、min(+P, Se)、F1）。
RAG 增强的时间序列与文本模态相较于非 RAG 对应模态显著提升性能。
在测试的文本编码器中，使用 Qwen-7B 进行长上下文临床笔记嵌入取得最优结果。
带自/互注意力的自适应多模态融合提供了对模态的更优集成。
REALM 对数据稀疏性具有鲁棒性，且在检索质量实体信号方面保持更高水平（分析实体重要性）。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。