QUICK REVIEW

[论文解读] RaDialog: A Large Vision-Language Model for Radiology Report Generation and Conversational Assistance

Chantal Pellegrini, Ege Özsoy|arXiv (Cornell University)|Nov 30, 2023

Multimodal Machine Learning Applications被引用 11

一句话总结

RaDialog 是一个公开可用的大型视觉-语言模型，整合图像特征和结构化发现与一个大型语言模型，以生成放射科报告并支持交互式对话，在 MIMIC-CXR 上实现了最先进的临床正确性。

ABSTRACT

Conversational AI tools that can generate and discuss clinically correct radiology reports for a given medical image have the potential to transform radiology. Such a human-in-the-loop radiology assistant could facilitate a collaborative diagnostic process, thus saving time and improving the quality of reports. Towards this goal, we introduce RaDialog, the first thoroughly evaluated and publicly available large vision-language model for radiology report generation and interactive dialog. RaDialog effectively integrates visual image features and structured pathology findings with a large language model (LLM) while simultaneously adapting it to a specialized domain using parameter-efficient fine-tuning. To keep the conversational abilities of the underlying LLM, we propose a comprehensive, semi-automatically labeled, image-grounded instruct dataset for chest X-ray radiology tasks. By training with this dataset, our method achieves state-of-the-art clinical correctness in report generation and shows impressive abilities in interactive tasks such as correcting reports and answering questions, serving as a foundational step toward clinical dialog systems. Our code is available on github: https://github.com/ChantalMP/RaDialog.

研究动机与目标

通过提高临床正确性来推进自动放射科报告生成。
实现交互对话与更正能力，以支持放射科医生。
将图像特征和结构化发现整合到一个参数高效的LLM工作流中。
提供公开可用的模型和 instruct 数据集，以便开展多样化下游任务。

提出的方法

使用 BioViL-T 作为胸部X线视觉编码器，以提取补丁级图像嵌入。
通过基于BERT的对齐模块将视觉特征对齐到文本空间，以生成32个图像令牌。
集成 CheXpert 分类器为图像生成结构化发现。
构建一个将图像令牌、预测的发现以及给 LLM 的指令整合在一起的单一提示。
在放射学数据和 instruct 数据集上，以多阶段训练方案使用 LoRA 对 Vicuna-7b LLM 进行微调。
创建一个包含八个任务类别的 instruct 数据集（包括报告生成、纠错、问答、摘要、易懂语言、解释等），在保持通用大语言模型能力的同时，专注于放射学任务。

实验结果

研究问题

RQ1RaDialog 能否基于胸部X线图像生成临床上正确的放射科报告？
RQ2交互式对话能力是否提升报告质量，并允许进行有效的更正与知识查询？
RQ3将视觉和结构化发现结合，与纯文本方法相比对临床准确性有何影响？
RQ4该模型在报告生成之外的下游任务（如更正与问答）是否有效？

主要发现

方法	CE	BS	B-1	B-4	MTR	R-L
R2Gen [7]	27.6	0.27*	35.3	10.3	14.2	27.7
MDT+WCL [53]	29.4	0.28*	37.3	10.7	14.4	27.4
M 2 Tr. [34]	30.8	0.39*	37.8	10.7	14.5	27.2
ITA [50]	30.8	-	39.5	12.1	14.7	28.4
METransformer [51]	31.1	-	38.6	12.4	15.2	29.1
Kiut [16]	32.1	-	39.3	11.3	16.0	28.5
RaDialog-INS	38.6	0.39	34.0	9.7	13.6	27.0
RaDialog-RG	39.4	0.40	34.6	9.5	14.0	27.1

RaDialog 在 MIMIC-CXR 上实现了最先进的临床疗效，在 CE 上比先前方法高出 7.3%。
RaDialog-RG 与 RaDialog-INS 变体在标准基准测试中，与更大的私有模型相比，达到有竞争力或优越的 NLG 指标。
经过 instruct 训练的 RaDialog-INS 在纠错和下游交互任务上显著优于仅仅用于报告生成的基线。
RaDialog 在 CE 上优于 MedPaLM-12b，并在 NLG 指标上更强，尽管使用的是公开数据且模型更小。
消融研究表明视觉输入和结构化输入都是必要的，且领域特定的微调对放射学任务至关重要。
公开的 RaDialog 模型支持交互式对话、纠错、区域QA和知识问答，促进人机协作。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。