QUICK REVIEW

[论文解读] Residual-based Language Models are Free Boosters for Biomedical Imaging

Zhixin Lai, Jing Wu|arXiv (Cornell University)|Mar 26, 2024

Radiomics and Machine Learning in Medical Imaging被引用 8

一句话总结

该论文显示，将来自预训练大型语言模型 (LLMs) 的残差块冻结并插入到视觉编码器中，可以在没有语言输入的情况下提升生物医学成像任务（2D 与 3D），在 MedMNIST 数据集上达到强劲或最先进的结果。

ABSTRACT

In this study, we uncover the unexpected efficacy of residual-based large language models (LLMs) as part of encoders for biomedical imaging tasks, a domain traditionally devoid of language or textual data. The approach diverges from established methodologies by utilizing a frozen transformer block, extracted from pre-trained LLMs, as an innovative encoder layer for the direct processing of visual tokens. This strategy represents a significant departure from the standard multi-modal vision-language frameworks, which typically hinge on language-driven prompts and inputs. We found that these LLMs could boost performance across a spectrum of biomedical imaging applications, including both 2D and 3D visual classification tasks, serving as plug-and-play boosters. More interestingly, as a byproduct, we found that the proposed framework achieved superior performance, setting new state-of-the-art results on extensive, standardized datasets in MedMNIST-2D and 3D. Through this work, we aim to open new avenues for employing LLMs in biomedical imaging and enriching the understanding of their potential in this specialized domain.

研究动机与目标

在标注数据有限且计算成本高的条件下，推动生物医学图像分析的改进。
提出一种基于残差的语言模型增强器（R-LLM），作为对视觉标记的冻结编码器块。
在多样的二维和三维生物医学成像数据集上评估该方法，以评估其泛化性。
证明该增强器无需语言提示或预训练的视觉-语言模型即可达到或超过最先进的结果。

提出的方法

在视觉编码器中插入一个冻结的 LLM Transformer 块 F_L，用于处理视觉标记。
在 LLM 块前后使用可训练的适配层 F_E 和 F_D，以对齐特征维度。
在 LLM 块前后应用残差连接，以促进梯度流动和信息交流。
训练期间保持 LLM 块被冻结；端到端训练 F_E、F_D，以及其余流水线。
为适应视觉数据，移除自回归掩模和 LLM 的位置嵌入；不需要语言提示。
在 2D 和 3D 生物医学数据集以及多种骨干网络（ViT、ViViT、ViT3D）上展示即插即用的有效性。

实验结果

研究问题

RQ1从大型语言模型中冻结的 Transformer 块能否在没有语言数据的情况下作为有效的生物医学成像视觉编码器？
RQ2基于残差的 LLM 增强器是否在不同数据集上改善 2D 与 3D 生物医学分类任务？
RQ3与微调 LLM 相比，冻结 LLM、仅训练适配模块是否更具优势？
RQ4残余连接在跨模态提升器的性能中是否起关键作用？

主要发现

R-LLM 在 2D 生物医学成像数据集（ACC 和/或 AUC 增益）和 3D 数据集上均显著提升性能。
该方法在若干基于 MedMNIST 的任务上达到最先进的结果，特别是在 OCTMNIST 与其他数据集上超越 SoTA。
冻结 LLM 块并使用可训练的适配器比端到端微调获得更好的结果，减少过拟合与训练复杂性。
残差结构对性能至关重要；没有正确残差设计的变体表现不佳。
Grad-CAM 可视化表明使用 R-LLM 作为增强器时对诊断相关区域的关注度提高。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。