[论文解读] Towards Conversational Diagnostic AI
AMIE 是一个以诊断对话为优化目标的基于大模型的系统,采用自我对弈的模拟学习和推理链路的推理策略,在一次盲法的远程 OSCE 研究中,在大多数评估指标上超过了初级保健医生。
At the heart of medicine lies the physician-patient dialogue, where skillful history-taking paves the way for accurate diagnosis, effective management, and enduring trust. Artificial Intelligence (AI) systems capable of diagnostic dialogue could increase accessibility, consistency, and quality of care. However, approximating clinicians' expertise is an outstanding grand challenge. Here, we introduce AMIE (Articulate Medical Intelligence Explorer), a Large Language Model (LLM) based AI system optimized for diagnostic dialogue. AMIE uses a novel self-play based simulated environment with automated feedback mechanisms for scaling learning across diverse disease conditions, specialties, and contexts. We designed a framework for evaluating clinically-meaningful axes of performance including history-taking, diagnostic accuracy, management reasoning, communication skills, and empathy. We compared AMIE's performance to that of primary care physicians (PCPs) in a randomized, double-blind crossover study of text-based consultations with validated patient actors in the style of an Objective Structured Clinical Examination (OSCE). The study included 149 case scenarios from clinical providers in Canada, the UK, and India, 20 PCPs for comparison with AMIE, and evaluations by specialist physicians and patient actors. AMIE demonstrated greater diagnostic accuracy and superior performance on 28 of 32 axes according to specialist physicians and 24 of 26 axes according to patient actors. Our research has several limitations and should be interpreted with appropriate caution. Clinicians were limited to unfamiliar synchronous text-chat which permits large-scale LLM-patient interactions but is not representative of usual clinical practice. While further research is required before AMIE could be translated to real-world settings, the results represent a milestone towards conversational diagnostic AI.
研究动机与目标
- 通过 AI 促进医学诊断对话的可及性、一致性和质量。
- 通过在仿真环境中的自我对弈,在多样的疾病和情境中扩展学习规模。
- 开发并验证能够涵盖病史采集、诊断推理、管理、沟通和同理心的评估框架。
提出的方法
- 使用真实世界和仿真数据对基础大模型(PaLM-2)进行医学对话微调。
- 创建具备内部和外部循环的自我对弈模拟诊断对话环境,以实现持续学习。
- 实现推理时的推理链过程,使回答基于对话历史。
- 设计以情景驱动的模拟对话,采用三实体设置(患者、医生、主持人)外加反馈的评论者。
- 针对患者和医生角色、医学问答、推理以及电子病历笔记摘要进行指令微调。
- 通过对149个病例、带有验证的患者扮演者,在盲法的远程 OSCE 中将 AMIE 与 PCPs 进行比较进行评估,并附专业医生评估和问卷调查。
实验结果
研究问题
- RQ1在多疾病诊断对话场景中,AMIE 是否能达到与初级保健医生同等或更高的诊断准确性?
- RQ2AMIE 在病史采集、诊断推理、管理计划、沟通和同理心等维度上的表现如何?
- RQ3基于文本聊天的诊断咨询存在哪些局限性,以及实现实际临床转化需要哪些步骤?
主要发现
- AMIE 在 OSCE 研究中的诊断准确性高于 PCPs。
- 从专科医生的角度看,AMIE 在 32 个维度中的 28 个上超越了 PCPs。
- 从患者扮演者的角度看,AMIE 在 26 个维度中的 24 个上超越了 PCPs。
- 在大多数评估维度上,AMIE 的评分优于 PCPs,对其余维度则不劣于 PCPs。
- 评估使用来自加拿大、英国和印度的 149 个病例场景,包含 20 名 PCP 和经过验证的患者扮演者。
- AMIE 使用推理链策略,在每次对话轮中逐步完善其回答。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。