[论文解读] Retrieval Augmented Chest X-Ray Report Generation using OpenAI GPT models
该论文使用检索增强生成(RAG)结合对比预训练的视觉-语言编码器来检索相关放射学文本,并使用 OpenAI GPT 模型生成胸部 X 线片印象,在提升临床指标的同时减少幻觉。
We propose Retrieval Augmented Generation (RAG) as an approach for automated radiology report writing that leverages multimodally aligned embeddings from a contrastively pretrained vision language model for retrieval of relevant candidate radiology text for an input radiology image and a general domain generative model like OpenAI text-davinci-003, gpt-3.5-turbo and gpt-4 for report generation using the relevant radiology text retrieved. This approach keeps hallucinated generations under check and provides capabilities to generate report content in the format we desire leveraging the instruction following capabilities of these generative models. Our approach achieves better clinical metrics with a BERTScore of 0.2865 (Δ+ 25.88%) and Semb score of 0.4026 (Δ+ 6.31%). Our approach can be broadly relevant for different clinical settings as it allows to augment the automated radiology report generation process with content relevant for that setting while also having the ability to inject user intents and requirements in the prompts as part of the report generation process to modulate the content and format of the generated reports as applicable for that clinical setting.
研究动机与目标
- Motivate a retrieval-augmented framework to improve radiology report generation.
- Leverage domain-aligned text-image embeddings for selective retrieval of radiology content.
- Use general-purpose GPT models with instruction-following prompts to generate impressions in desired formats.
- Showcase the ability to produce structured outputs and control for clinical setting needs.
- Evaluate reductions in hallucinations and improvements in clinical metrics compared to baselines.
提出的方法
- Construct a retrieval corpus from CXR-PRO impressions (report-level and sentence-level).
- Compute image and text embeddings with a contrastively pretrained vision-language model (ALBEF) trained on CXR-PRO/CXR-ReDonE data.
- Retrieve top-K similar sentences or reports for a given chest X-ray image using dot-product similarity.
- Prompt OpenAI LLMs (text-davinci-003, gpt-3.5-turbo, GPT-4) with retrieved context to generate radiology impressions.
- Optionally iteratively refine outputs when context exceeds token limits (refine mechanism).
- Optionally generate structured JSON outputs by prompting for attributes like pathology, positional info, severity, and size.
实验结果
研究问题
- RQ1Can Retrieval Augmented Generation improve semantic alignment of generated radiology impressions to ground truth vs. purely retrieval-based methods?
- RQ2Does using a domain-aligned retrieval corpus with general LLMs reduce hallucinations in radiology report generation?
- RQ3How does K (number of retrieved records) affect BERTScore, S_emb, and RadGraph F1 metrics?
- RQ4Can prompts (zero-shot vs few-shot) steer outputs into structured formats suitable for downstream applications?
- RQ5Does the approach preserve important clinical entities while reducing noisy or irrelevant content?
主要发现
- RAG with ALBEF-based retrieval and GPT models improves BERTScore by up to about 25.88% over the retrieval-only baseline on CXR-PRO.
- RAG with retrieval improves S_emb by about 6.31% over the baseline for top-K retrievals.
- On MS-CXR, RAG enhances BERTScore and S_emb over the baseline and matches RadGraph F1 in qualitative terms.
- The system reduces hallucinations by constraining generations with retrieved context, with S_emb between generated impressions and retrieved context typically high (average 0.8466, about 87% of test impressions have S_emb > 0.70).
- Prompt engineering enables structured JSON outputs containing pathologies, positional info, severity, and size, in addition to free-text impressions.
- RAG-based generation yields concise impressions less noisy than pure retrieval while retaining key clinical entities.
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。