QUICK REVIEW

[论文解读] Radiology-GPT: A Large Language Model for Radiology

Zhengliang Liu, Aoxiao Zhong|arXiv (Cornell University)|Jun 14, 2023

Topic Modeling被引用 21

一句话总结

Radiology-GPT 是一个专注于放射学的大语言模型，通过对 MIMIC-CXR 数据进行指令微调来从发现生成放射学印象，与通用指令微调模型相比具有优势，并在临床部署中具有强隐私优势。

ABSTRACT

We introduce Radiology-GPT, a large language model for radiology. Using an instruction tuning approach on an extensive dataset of radiology domain knowledge, Radiology-GPT demonstrates superior performance compared to general language models such as StableLM, Dolly and LLaMA. It exhibits significant versatility in radiological diagnosis, research, and communication. This work serves as a catalyst for future developments in clinical NLP. The successful implementation of Radiology-GPT is indicative of the potential of localizing generative large language models, specifically tailored for distinctive medical specialties, while ensuring adherence to privacy standards such as HIPAA. The prospect of developing individualized, large-scale language models that cater to specific needs of various hospitals presents a promising direction. The fusion of conversational competence and domain-specific knowledge in these models is set to foster future development in healthcare AI. A demo of Radiology-GPT is available at https://huggingface.co/spaces/allen-eric/radiology-gpt.

研究动机与目标

Develop a localized, privacy-preserving LLM tailored for radiology to interpret findings and generate impressions.
Demonstrate instruction-tuning effectiveness on radiology data versus general models.
Evaluate outputs with domain-relevant quality metrics beyond traditional NLP benchmarks.
Explore implications for clinical decision support, patient communication, and multi-domain AI collaborations in healthcare.

提出的方法

Use Alpaca-7B as the base model and apply LoRA fine-tuning to enable efficient, localized training.
Preprocess MIMIC-CXR reports to extract paired Findings and Impression sections for training.
Train with instruction-tuning to map Findings to Impressions using the instruction: 'Derive the impression from findings in the radiology report'.
Evaluate against other LLMs using domain-specific metrics for Understandability, Coherence, Relevance, Conciseness, and Clinical Utility.
Validate on MIMIC-CXR test set and an independent OpenI dataset as external test data.
Highlight privacy advantages by keeping models on hospital infrastructure and adhering to HIPAA.

Figure 1 : The overall framework of Radiology-GPT.

实验结果

研究问题

RQ1Can a radiology-domain LLM trained with instruction-following on radiology reports outperform general instruction-tuned models in generating clinically useful impressions?
RQ2Does domain-specific instruction tuning improve understandability, coherence, relevance, conciseness, and clinical utility of radiology impressions compared to non-domain LLMs?
RQ3What are the privacy and deployment implications of a localized Radiology-GPT in clinical settings?
RQ4How does Radiology-GPT compare to ChatGPT in radiology-impression tasks, and where do trade-offs (e.g., conciseness vs. relevance) occur?

主要发现

Radiology-GPT outperforms general instruction-tuned models such as StableLM, Dolly, and LLaMA on radiology impression tasks.
Radiology-GPT is comparable to ChatGPT in understandability and slightly better in coherence.
Radiology-GPT delivers higher conciseness and clinical utility than ChatGPT, but may have slightly lower relevance due to shorter outputs.
General-domain models lacking radiology-specific instruction tuning underperform Radiology-GPT and ChatGPT.
Domain-specific instruction tuning and local deployment (HIPAA-compliance) significantly enhance practical radiology AI utility.

Figure 2 : The instruction-tuning process of Radiology-GPT.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。