QUICK REVIEW

[论文解读] A Multimodal Transformer: Fusing Clinical Notes with Structured EHR Data for Interpretable In-Hospital Mortality Prediction

Weimin Lyu, Xinyu Dong|PubMed|Aug 9, 2022

Machine Learning in Healthcare参考文献 21被引用 35

一句话总结

引入一种将时间序列结构化电子健康记录数据与临床笔记融合的多模态 Transformer，使用 Clinical BERT 处理笔记，并通过多模态编码器提升住院死亡率预测的可解释性（IG 与 Shapley 分析可解释性）

ABSTRACT

Deep-learning-based clinical decision support using structured electronic health records (EHR) has been an active research area for predicting risks of mortality and diseases. Meanwhile, large amounts of narrative clinical notes provide complementary information, but are often not integrated into predictive models. In this paper, we provide a novel multimodal transformer to fuse clinical notes and structured EHR data for better prediction of in-hospital mortality. To improve interpretability, we propose an integrated gradients (IG) method to select important words in clinical notes and discover the critical structured EHR features with Shapley values. These important words and clinical features are visualized to assist with interpretation of the prediction outcomes. We also investigate the significance of domain adaptive pretraining and task adaptive fine-tuning on the Clinical BERT, which is used to learn the representations of clinical notes. Experiments demonstrated that our model outperforms other methods (AUCPR: 0.538, AUCROC: 0.877, F1:0.490).

研究动机与目标

通过整合非结构化的临床笔记与结构化的 EHR 时间序列数据，提升住院死亡率预测。
通过单词级和特征级解释提升预测的可解释性。
评估领域自适应预训练和任务自适应微调对预测性能的影响，尤其是在 Clinical BERT 上。

提出的方法

将 17 个预处理的临床变量作为时间序列输入。
使用在 MIMIC 数据上训练的微调 Clinical BERT (MBERT) 对笔记进行嵌入。
使用三编码器的多模态融合，分别对笔记和时间序列数据进行编码，然后通过多模态编码器在共享空间中融合。
应用 Transformer 捕捉 ICU 住院期间的时间依赖性，并使用 T0 token 作为多模态表示。
将多模态表示与笔记嵌入拼接，并通过带交叉熵损失和 L2 正则化的 MLP 进行预测。
在 mortality 任务上研究不同 BERT 变体（BERT、BioBERT、BioRoBERTa、Clinical BERT）的领域自适应预训练和任务自适应微调。

实验结果

研究问题

RQ1如何使用 Transformer 将临床笔记与结构化 EHR 时间序列数据有效融合以进行死亡率预测？
RQ2领域自适应预训练和任务自适应微调的临床语言模型是否提升了该任务的预测性能？
RQ3集成梯度和 Shapley 值是否能分别为笔记标记与结构化特征提供有意义的可解释性？

主要发现

预测模型	AUCPR	AUCROC	F1
仅变量 \| LSTM	0.460(±0.013)	0.821(±0.006)	0.392(±0.038)
Transformer	0.473(±0.011)	0.827(±0.005)	0.406(±0.025)
仅笔记 \| MBERT	0.482(±0.012)	0.851(±0.005)	0.382(±0.079)
融合 \| MBERT+LSTM	0.508(±0.002)	0.859(±0.001)	0.478(±0.023)
Multimodal Transformer (Ours)	0.538(±0.004)	0.877(±0.001)	0.490(±0.036)

多模态 Transformer 实现了更优的性能：AUCPR 0.538，AUCROC 0.877，F1 0.490。
与仅使用变量、仅使用笔记或简单融合策略的模型相比，改进明显。
领域自适应预训练与任务自适应微调（Clinical BERT 变体）显著影响性能，在任务适配后基于 Clinical BERT 的模型表现最好。
集成梯度突出显示了如症状和预后指标等临床有意义的笔记标记，而 Shapley 值识别出对结构化特征贡献最大的项（例如 Glasgow Coma Scale、呼吸与血流动力学指标）。
所提出的体系结构通过 Transformer 注意力实现时间感知的融合，利用 ICU 住院期间所有时间点的信息。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。