QUICK REVIEW

[论文解读] Dia-LLaMA: Towards Large Language Model-driven CT Report Generation

Zhixuan Chen, Luyang Luo|arXiv (Cornell University)|Mar 25, 2024

Topic Modeling被引用 7

一句话总结

Dia-LLaMA 将 LLaMA2-7B 调整用于 CT 报告生成，通过使用疾病感知注意力模块、疾病原型记忆库以及诊断文本提示来引导大语言模型，在 CTRG-Chest-548K 上实现最先进的结果。

ABSTRACT

Medical report generation has achieved remarkable advancements yet has still been faced with several challenges. First, the inherent imbalance in the distribution of normal and abnormal cases may lead models to exhibit a biased focus on normal samples, resulting in unreliable diagnoses. Second, the frequent occurrence of common template sentences in the reports may overwhelm the critical abnormal information. Moreover, existing works focus on 2D chest X-rays, leaving CT report generation underexplored due to the high-dimensional nature of CT images and the limited availability of CT-report pairs. Recently, LLM has shown a great ability to generate reliable answers with appropriate prompts, which shed light on addressing the aforementioned challenges. In this paper, we propose Dia-LLaMA, a framework to adapt the LLaMA2-7B for CT report generation by incorporating diagnostic information as guidance prompts. Considering the high dimension of CT, we leverage a pre-trained ViT3D with perceiver to extract the visual information. To tailor the LLM for report generation and emphasize abnormality, we extract additional diagnostic information by referring to a disease prototype memory bank, which is updated during training to capture common disease representations. Furthermore, we introduce disease-aware attention to enable the model to adjust attention for different diseases. Experiments on the chest CT dataset demonstrated that our proposed method outperformed previous methods and achieved state-of-the-art on both clinical efficacy performance and natural language generation metrics. The code will be made publically available.

研究动机与目标

解决 CT 报告生成中对常见正常情况的偏倚识别以及对罕见异常的偏差
利用大型语言模型（LLMs）在诊断信息引导下生成连贯的 CT 报告。
引入机制以处理高维 CT 数据和疾病表征中的数据不平衡。

提出的方法

通过两段式提示集（视觉 tokens 与诊断 tokens）将 CT 视觉嵌入与 LLM 提示相结合。
使用视觉编码器（ViT3D 与 Perceiver）提取补丁特征并将其投射到 LLM 空间。
引入疾病感知注意力，以从补丁特征中获得疾病级别特征。
维护一个可学习的疾病原型记忆库（正常与异常原型），通过对比损失（InfoNCE）更新。
将诊断结果转换为文本提示（The {disease} is [state]），以引导 LLM 的解码。
以疾病原型损失与语言建模损失的加权和进行训练（L = L_DP + lambda * L_LM）。

实验结果

研究问题

RQ1疾病感知注意力是否能提高对异常与正常 CT 发现的区分，从而提升 LLM 驱动的报告生成效果？
RQ2疾病原型记忆库是否有助于缓解 CTRG 中对罕见异常的数据不平衡？
RQ3诊断文本提示是否能有效引导 LLM 生成临床上准确的 CT 报告？

主要发现

METHOD	YEAR	CE 预测值	CE 召回率	CE F1	BLEU-1	BLEU-4	METEOR	ROUGE-L
Ours	-	0.421	0.387	0.372	51.16	29.64	26.28	42.15
R2Gen	2020	0.207	0.121	0.144	34.11	23.39	21.40	47.75
R2GenCMN	2022	0.158	0.100	0.114	35.88	23.37	21.43	45.94
M2KT	2023	0.220	0.119	0.145	46.09	21.93	25.20	36.47
PromptMRG	2023	0.290	0.330	0.290	47.73	23.02	22.87	37.35
SL-DG	2024	-	-	-	-	-	-	43.80
RadFM	2023	0.403	0.361	0.345	46.70	24.70	24.01	38.98

在 CTRG-Chest-548K 的临床有效性（CE）和若干自然语言生成（NLG）指标上达到最新的性能。
CE：F1 提升至 0.372，相较 RadFM 提升 7.8%（同一行的 0.421 精确度、0.387 召回、0.372 F1）。
NLG：在BLEU-1、BLEU-4和METEOR指标上超过基线；在某些设置中 ROUGE-L 落后，原因是模板句子生成倾向。
消融实验显示移除 DPM、DAA 或 DTP 将降性能，完整模型在大多数指标上取得最强结果。
文本型诊断提示在大多数设置中优于 None、Token 或 Feature 提示，突显以语言为基础的引导的有效性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。