QUICK REVIEW

[论文解读] ELIXR: Towards a general purpose X-ray artificial intelligence system through alignment of large language models and radiology vision encoders

Shawn Xu, Lin Yang|arXiv (Cornell University)|Aug 2, 2023

COVID-19 diagnosis using AI被引用 18

一句话总结

ELIXR 将一个视觉编码器与冻结的 LLM 对齐，使其能够在胸部 X 光影像的成对图像与自由文本报告下，实现零样本分类、数据高效学习、语义检索、VQA 和放射科报告 QA。

ABSTRACT

In this work, we present an approach, which we call Embeddings for Language/Image-aligned X-Rays, or ELIXR, that leverages a language-aligned image encoder combined or grafted onto a fixed LLM, PaLM 2, to perform a broad range of chest X-ray tasks. We train this lightweight adapter architecture using images paired with corresponding free-text radiology reports from the MIMIC-CXR dataset. ELIXR achieved state-of-the-art performance on zero-shot chest X-ray (CXR) classification (mean AUC of 0.850 across 13 findings), data-efficient CXR classification (mean AUCs of 0.893 and 0.898 across five findings (atelectasis, cardiomegaly, consolidation, pleural effusion, and pulmonary edema) for 1% (~2,200 images) and 10% (~22,000 images) training data), and semantic search (0.76 normalized discounted cumulative gain (NDCG) across nineteen queries, including perfect retrieval on twelve of them). Compared to existing data-efficient methods including supervised contrastive learning (SupCon), ELIXR required two orders of magnitude less data to reach similar performance. ELIXR also showed promise on CXR vision-language tasks, demonstrating overall accuracies of 58.7% and 62.5% on visual question answering and report quality assurance tasks, respectively. These results suggest that ELIXR is a robust and versatile approach to CXR AI.

研究动机与目标

借助日常收集的胸部 X 光（CXR）影像-报告对来训练一个多模态模型，使其在多样的放射科任务中都能工作。
实现胸部 X 线分类的强大零样本与数据高效性能，输出丰富且带文本引导。
通过 LLM 对齐，启用语义检索、视觉问答和放射科报告 QA 等多模态能力。

提出的方法

两阶段架构：ELIXR-C 使用 CLIP 风格对比学习训练一个语言对齐的图像编码器，将 CXR 图像与放射科报告对齐。
ELIXR-B 在冻结的 ELIXR-C 编码器与冻结的 LLM（PaLM 2-S）之间接入一个轻量级适配器（Q-Former），将图像嵌入映射到 LLM 的标记空间。
通过冻结 LLM 与视觉编码器，仅训练适配器实现数据高效训练。
第1阶段使用 image-text contrastive learning (ITC)、image-grounded text generation (ITG) 和 image-text matching (ITM) 来学习跨模态表征。
第2阶段训练适配器以生成 LLM 对齐的标记并实现 impression 生成，然后使用 LLM 执行下游生成任务，如 VQA 和报告 QA。

实验结果

研究问题

RQ1ELIXR 能否在胸部 X 线发现的零样本分类性能上达到与最先进的有监督方法相当的水平？
RQ2与完全有监督的基线相比，使用 ELIXR-C 和 ELIXR-B 时，CXR 分类的数据效率能达到多高？
RQ3ELIXR 框架能否支持语义检索并高质量地检索临床相关的 CXR 图像？
RQ4使用 LLM 对齐的多模态流水线，ELIXR 是否能够实现 VQA 和放射科报告质量保证？
RQ5该方法是否在多个数据集和医院源上具有泛化性？

主要发现

ELIXR 在 CheXpert 的 13 个发现上实现了零样本分类的最新水平，平均 AUC 为 0.850。
ELIXR-C 与 ELIXR-B 实现数据高效的线性探测性能，在五个 CheXpert 发现上，平均 AUC 为 0.893（1% 数据）和 0.898（10% 数据），超过先前方法。
在语义检索方面，ELIXR 在十九个查询上达到 0.76 的 NDCG@5，出现若干完美检索（12/19 查询）。
ELIXR-B 在检索质量上优于 ELIXR-C 与 MedCLIP，在不同查询组中达到更高的 precision@5 和 NDCG@5。
在 VQA 和报告 QA 任务中，ELIXR 在 CXR 任务上的整体准确率分别为 58.7%（VQA）和 62.5%（报告 QA）。
相较现有的数据高效方法，ELIXR 需要少两个数量级的数据才能达到相似的性能。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。