QUICK REVIEW

[论文解读] Can GPT-4V(ision) Serve Medical Applications? Case Studies on GPT-4V for Multimodal Medical Diagnosis

Chaoyi Wu, Jiayu Lei|arXiv (Cornell University)|Oct 15, 2023

Artificial Intelligence in Healthcare and Education被引用 46

一句话总结

本研究评估GPT-4V在跨17个身体系统和8种影像模态的多模态医疗诊断能力，强调在模态/解剖识别方面的优势，但在诊断、报告、定位和多图像推理方面存在重大差距。

ABSTRACT

Driven by the large foundation models, the development of artificial intelligence has witnessed tremendous progress lately, leading to a surge of general interest from the public. In this study, we aim to assess the performance of OpenAI's newest model, GPT-4V(ision), specifically in the realm of multimodal medical diagnosis. Our evaluation encompasses 17 human body systems, including Central Nervous System, Head and Neck, Cardiac, Chest, Hematology, Hepatobiliary, Gastrointestinal, Urogenital, Gynecology, Obstetrics, Breast, Musculoskeletal, Spine, Vascular, Oncology, Trauma, Pediatrics, with images taken from 8 modalities used in daily clinic routine, e.g., X-ray, Computed Tomography (CT), Magnetic Resonance Imaging (MRI), Positron Emission Tomography (PET), Digital Subtraction Angiography (DSA), Mammography, Ultrasound, and Pathology. We probe the GPT-4V's ability on multiple clinical tasks with or without patent history provided, including imaging modality and anatomy recognition, disease diagnosis, report generation, disease localisation. Our observation shows that, while GPT-4V demonstrates proficiency in distinguishing between medical image modalities and anatomy, it faces significant challenges in disease diagnosis and generating comprehensive reports. These findings underscore that while large multimodal models have made significant advancements in computer vision and natural language processing, it remains far from being used to effectively support real-world medical applications and clinical decision-making. All images used in this report can be found in https://github.com/chaoyi-wu/GPT-4V_Medical_Evaluation.

研究动机与目标

评估GPT-4V识别医学图像模态和解剖结构的能力。
评估GPT-4V在多种影像模态下的诊断、报告生成和定位表现。
考察患者病史和多图像输入对GPT-4V输出的影响。
识别在放射学和病理学中临床使用GPT-4V的局限性和安全考量。

提出的方法

从Radiopaedia选择放射学病例，覆盖17个身体系统和8种影像模态。
通过在线界面向GPT-4V输入至多四张2D图像，并提示完成诊断、生成报告和定位等任务。
在参考标注来自Radiopaedia/PathologyOutlines作为正确性基线，同时注意标准临床格式的局限性。
进行两轮病理学评估对话（仅图像再图像+组织来源）以及分步定位任务（存在、边界框、IOU）。
对图像强度进行夹紧和归一化，并选择与经验放射科医师指南一致的关键切片；分别评估多图像输入和跨模态输入。

Figure 1 : The Diagram of Medical Systems and Imaging Modalities. In this paper we comprehensively consider 17 medical systems (Figure a ) and our cases can cover 8 different imaging modalities (Figure b ), i.e. , X-ray, CT, MRI, PET, DSA, Mammography, Ultrasound, Pathology from left to right.

实验结果

研究问题

RQ1GPT-4V是否能在医学图像中正确识别影像模态和解剖结构？
RQ2GPT-4V能否在医学图像中定位解剖结构或异常？
RQ3GPT-4V能否生成准确且具有临床意义的放射学或病理学报告？
RQ4当整合患者病史或来自不同模态的多张图像时，GPT-4V的表现如何？
RQ5在实际医学决策中使用GPT-4V时有哪些局限性和安全考量？

主要发现

GPT-4V能够在许多案例中识别影像模态和解剖结构。
GPT-4V在准确疾病诊断和全面报告生成方面存在困难。
GPT-4V可以生成结构化报告，但内容常常不准确。
GPT-4V能够对图像中的文本和标记进行OCR，但可能误解注释。
GPT-4V能够识别医疗设备及其位置。
GPT-4V在分析多张图像和在轮次之间维持上下文方面存在困难。
GPT-4V的预测很大程度上受患者病史影响。
GPT-4V无法可靠地定位解剖结构或异常（IOU低，方差高）。
性能存在差异，输出中存在不一致和安全性担忧。

Figure 2 : A Demonstration Case From Central Nervous System . “Cont.” denotes this sample is a continuation of the case titled as “Central Nervous System: Case 3”. Red in the figure denotes the incorrect parts, green denoting correct parts and yellow for uncertainty. The colored sections within the

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。