QUICK REVIEW

[论文解读] Towards Medical Artificial General Intelligence via Knowledge-Enhanced Multimodal Pretraining

Bingqian Lin, Zicong Chen|arXiv (Cornell University)|Apr 26, 2023

Multimodal Machine Learning Applications被引用 9

一句话总结

MOTOR 引入一种知识增强的多模态预训练范式，将通用和特定的医学知识注入到单一的医学基础模型中，使其在多样化任务中实现理解与生成，在医学多模态基准上具有强大零-shot与微调性能。

ABSTRACT

Medical artificial general intelligence (MAGI) enables one foundation model to solve different medical tasks, which is very practical in the medical domain. It can significantly reduce the requirement of large amounts of task-specific data by sufficiently sharing medical knowledge among different tasks. However, due to the challenges of designing strongly generalizable models with limited and complex medical data, most existing approaches tend to develop task-specific models. To take a step towards MAGI, we propose a new paradigm called Medical-knOwledge-enhanced mulTimOdal pretRaining (MOTOR). In MOTOR, we combine two kinds of basic medical knowledge, i.e., general and specific knowledge, in a complementary manner to boost the general pretraining process. As a result, the foundation model with comprehensive basic knowledge can learn compact representations from pretraining radiographic data for better cross-modal alignment. MOTOR unifies the understanding and generation, which are two kinds of core intelligence of an AI system, into a single medical foundation model, to flexibly handle more diverse medical tasks. To enable a comprehensive evaluation and facilitate further research, we construct a medical multimodal benchmark including a wide range of downstream tasks, such as chest x-ray report generation and medical visual question answering. Extensive experiments on our benchmark show that MOTOR obtains promising results through simple task-oriented adaptation. The visualization shows that the injected knowledge successfully highlights key information in the medical data, demonstrating the excellent interpretability of MOTOR. Our MOTOR successfully mimics the human practice of fulfilling a "medical student" to accelerate the process of becoming a "specialist". We believe that our work makes a significant stride in realizing MAGI.

研究动机与目标

推动对医学通用人工智能（MAGI）的需求并降低对特定任务数据的依赖。
提出一种知识增强的多模态预训练范式（MOTOR），注入通用和特定的医学知识。
使单一基础模型在多种下游医学任务中实现理解与生成。
构建覆盖理解与生成任务的全面医学多模态基准，以评估具备 MAGI 能力的模型。
展示可解释性和知识驱动的注意力，作为改进跨模态对齐的证据。

提出的方法

在 BLIP 风格的多模态预训练框架中扩展四个单模态编码器（图像、报告、通用知识、特定知识）。
引入三对跨模态编码器（图像–报告、GK–图像、SK–图像）以及一个报告解码器用于生成。
注入通用知识（GK）以增强图像特征，然后从报告队列中检索与实例相关的特定知识（SK）以细化特征。
加入基于文本的多标签分类（MLC）预训练目标，以将 GK 与 SK 与数据桥接起来。
采用 ITC、ITM 和 LM 目标作为核心预训练任务，遵循 BLIP 的做法，并在 GK/SK 指导下进行增强。
在包含图像–报告检索、报告生成、诊断分类以及医学视觉问答的医学多模态基准上进行评估。

实验结果

研究问题

RQ1单一医学基础模型是否能够同时利用通用与特定医学知识，在理解与生成等多模态任务中发挥作用？
RQ2相较于无知识基线，知识增强的预训练是否能提升跨模态对齐及下游任务性能？
RQ3在放射学、诊断和问答等任务中，医学基础模型在同一预训练范式下能够在多大程度上实现泛化？

主要发现

MOTOR 在各任务中表现最好或具竞争力，在零-shot 与微调设置下显著优于无知识变体。
MIMIC-CXR 的图像–报告检索显示 GK 与 SK 均贡献了增益，MOTOR（本模型）在多种配置（如零-shot 与微调）中达到最高或接近最高的 R@k。
IU-Xray 的医学报告生成显示 MOTOR 的 CIDEr 获得显著提升，完整模型 CIDEr 达到 0.699。
ChestX-ray14 与 MIMIC-CXR 的诊断分类中，MOTOR 超越无知识变体，在 AUROC 与 F1 指标上具竞争力甚至优于现有最优基线。
VQA-RAD 与 SLAKE 的医学视觉问答显示 MOTOR 提升了开放式回答与总体准确性，表明跨模态理解与生成能力更强。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。