QUICK REVIEW

[论文解读] Improving Medical Large Vision-Language Models with Abnormal-Aware Feedback

Yucheng Zhou, Liang Song|arXiv (Cornell University)|Jan 2, 2025

Multimodal Machine Learning Applications被引用 4

一句话总结

UMed-LVLM 通过 MAU 数据集和 Abnormal-Aware 指令微调与奖励实现异常揭示，提升异常定位与医学图像理解，优于现有的 Med-LVLM。

ABSTRACT

Existing Medical Large Vision-Language Models (Med-LVLMs), encapsulating extensive medical knowledge, demonstrate excellent capabilities in understanding medical images. However, there remain challenges in visual localization in medical images, which is crucial for abnormality detection and interpretation. To address these issues, we propose a novel UMed-LVLM designed to unveil medical abnormalities. Specifically, we collect a Medical Abnormalities Unveiling (MAU) dataset and propose a two-stage training method for UMed-LVLM training. To collect MAU dataset, we propose a prompt method utilizing the GPT-4V to generate diagnoses based on identified abnormal areas in medical images. Moreover, the two-stage training method includes Abnormal-Aware Instruction Tuning and Abnormal-Aware Rewarding, comprising Relevance Reward, Abnormal Localization Reward and Vision Relevance Reward. Experimental results demonstrate that our UMed-LVLM significantly outperforms existing Med-LVLMs in identifying and understanding medical abnormalities, achieving a 58% improvement over the baseline. In addition, this work shows that enhancing the abnormality detection capabilities of Med-LVLMs significantly improves their understanding of medical images and generalization capability.

研究动机与目标

推动提升医学 LVLM 的视觉定位能力，以增强异常检测与可解释性。
开发注释异常区域并生成以异常为焦点的诊断的数据集（MAU）。
引入 Abnormal-Aware 指令微调和 Abnormal-Aware 奖励，使 Med-LVLM 聚焦于异常区域。

提出的方法

通过使用 GPT-4V 的提示方法，基于医学图像中已识别的异常区域生成诊断来创建 MAU。
将 UMed-LVLM 的训练分为两阶段：Abnormal-Aware 指令微调与 Abnormal-Aware 奖励（AAR）。
AAR 将 LLM 相关性奖励框架与 Abnormal Localization Rewarding（ALR）和 Vision Relevance Rewarding（VRR）相结合。
ALR 将预测框与真值异常框之间的 IoU 作为定位奖励。
VRR 评估异常类别令牌与异常图像斑块之间的注意力对齐。
对奖励进行归一化与聚合，形成带有熵正则化的结合 PPO 的目标。

实验结果

研究问题

RQ1将异常揭示数据与奖励纳入是否能提升 Med-LVLM 的异常定位与诊断准确性？
RQ2Abnormal-Aware 奖励如何影响模型在不同医学模态中的注意力与异常定位？
RQ3异常感知训练在未见医学类别和跨模态数据上的泛化性如何？

主要发现

Method	DL	KS	KV	NIH	TBX	Avg
MiniGPT-4	0.02	0.00	0.02	0.00	0.00	0.01
mPLUG-Owl	0.05	0.00	0.01	0.00	0.00	0.01
LLaVA	0.20	0.00	0.04	0.00	0.00	0.05
Qwen-VL	0.13	0.00	0.01	0.00	0.00	0.03
XrayGPT	0.18	0.12	0.02	0.07	0.06	0.09
LLaVA-Med	0.22	0.04	0.12	0.03	0.01	0.08
Med-Flamingo	0.27	0.15	0.15	0.09	0.02	0.14
MedVInt	0.29	0.11	0.27	0.08	0.09	0.17
MedVInt ∗	0.44	0.94	0.95	0.30	0.80	0.69
MedVInt ⋆	0.42	0.93	0.93	0.28	0.78	0.67
UMed-LVLM	0.53	0.99	0.98	0.37	0.86	0.75
GPT-4V	-	-	-	-	-	0.34

UMed-LVLM 在 MAU 测试集上的多数据集（DL、KS、KV、NIH、TBX）超越现有 Med-LVLM 及若干 LVLM 模型。
两阶段的异常感知训练相较基线带来显著提升，消融实验表明 ALR 与 VRR 都对性能有贡献。
泛化性：UMed-LVLM 相较于 MedVInt 变体及其他基线在跨数据集与跨模态方面表现更好。
定位准确性对诊断有影响：在 IoU 约为 0.6 附近的改进能显著提升诊断性能，但超出该点的增益趋于平缓。
指令微调的数据规模与训练轮次对性能有影响，更多数据与更多轮次提升结果。
跨模态在单一模态上的训练在评估其他模态时也能提升性能。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。