QUICK REVIEW

[论文解读] A Transformer-based representation-learning model with unified processing of multimodal input for clinical diagnostics

Hong-Yu Zhou, Yizhou Yu|arXiv (Cornell University)|Jun 1, 2023

COVID-19 diagnosis using AI被引用 11

一句话总结

基于 Transformer 的模型将多模态临床数据（影像、文本、实验室数据、人口统计信息）统一为一个表示，以帮助诊断并预测 COVID-19 结局，优于仅图像模型和非统一的基线。

ABSTRACT

During the diagnostic process, clinicians leverage multimodal information, such as chief complaints, medical images, and laboratory-test results. Deep-learning models for aiding diagnosis have yet to meet this requirement. Here we report a Transformer-based representation-learning model as a clinical diagnostic aid that processes multimodal input in a unified manner. Rather than learning modality-specific features, the model uses embedding layers to convert images and unstructured and structured text into visual tokens and text tokens, and bidirectional blocks with intramodal and intermodal attention to learn a holistic representation of radiographs, the unstructured chief complaint and clinical history, structured clinical information such as laboratory-test results and patient demographic information. The unified model outperformed an image-only model and non-unified multimodal diagnosis models in the identification of pulmonary diseases (by 12% and 9%, respectively) and in the prediction of adverse clinical outcomes in patients with COVID-19 (by 29% and 7%, respectively). Leveraging unified multimodal Transformer-based models may help streamline triage of patients and facilitate the clinical decision process.

研究动机与目标

在诊断过程中整合多模态临床信息的必要性。
开发一个基于 Transformer 的表示学习模型，能够以统一方式处理图像、非结构化文本和结构化数据。
证明诊断性能相较仅使用图像和非统一的多模态方法有所提升。
展示对分诊和临床决策的潜在益处。
强调在放射影像、首要症状、临床病史、实验室数据和人口统计学方面的适用性。

提出的方法

使用嵌入层将图像和文本（非结构化和结构化）转换为视觉和文本令牌。
采用双向 Transformer 块，具备同模和跨模注意力，以学习整体表征。
在统一架构中处理放射影像、首要症状、临床病史、实验室结果和人口统计信息。
将统一的多模态模型与仅使用图像的基线和非统一的多模态基线进行比较。
在肺部疾病识别和 COVID-19 不良结局预测上进行评估。

实验结果

研究问题

RQ1一个统一的多模态 Transformer 模型是否在识别肺部疾病方面优于仅使用图像的模型？
RQ2相较于非统一的多模态方法，统一模型在预测 COVID-19 不良临床结局方面是否有改进？
RQ3将多种数据模态（图像、文本、结构化数据）统一是否提升诊断决策？
RQ4同模态注意力和跨模态注意力对学习整体临床表征有何影响？

主要发现

统一的多模态模型在肺部疾病识别方面的表现优于仅图像模型，提升了 12%。
统一模型在肺部疾病识别方面的表现优于非统一的多模态模型，提升了 9%。
在 COVID-19 不良结局方面，统一模型较仅图像基线提升了 29%，较非统一的多模态模型提升了 7%。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。