QUICK REVIEW

[论文解读] Agentic LLM Workflows for Generating Patient-Friendly Medical Reports

M. Sudarshan, Shu-Min Shih|arXiv (Cornell University)|Aug 2, 2024

Semantic Web and Ontologies被引用 8

一句话总结

本文提出一个多代理基于反思的工作流，通过迭代生成对患者友好的放射科信函，在准确性和可读性方面超过零次提示并减少所需编辑。

ABSTRACT

The application of Large Language Models (LLMs) in healthcare is expanding rapidly, with one potential use case being the translation of formal medical reports into patient-legible equivalents. Currently, LLM outputs often need to be edited and evaluated by a human to ensure both factual accuracy and comprehensibility, and this is true for the above use case. We aim to minimize this step by proposing an agentic workflow with the Reflexion framework, which uses iterative self-reflection to correct outputs from an LLM. This pipeline was tested and compared to zero-shot prompting on 16 randomized radiology reports. In our multi-agent approach, reports had an accuracy rate of 94.94% when looking at verification of ICD-10 codes, compared to zero-shot prompted reports, which had an accuracy rate of 68.23%. Additionally, 81.25% of the final reflected reports required no corrections for accuracy or readability, while only 25% of zero-shot prompted reports met these criteria without needing modifications. These results indicate that our approach presents a feasible method for communicating clinical findings to patients in a quick, efficient and coherent manner whilst also retaining medical accuracy. The codebase is available for viewing at http://github.com/malavikhasudarshan/Multi-Agent-Patient-Letter-Generation.

研究动机与目标

在从放射科报告生成面向患者的医疗信件时，减少对人工验证的需求。
通过在患者信件中保留 ICD-10 码来提高事实准确性。
在保持医疗内容的前提下，提升可读性以达到或接近目标年级的读写水平。
展示与电子健康记录（EHR）服务器的端到端集成以实现自动部署。

提出的方法

使用基于反思的多代理框架，通过自我反思对LLM输出进行迭代改进。
用初始的LLM处理从原始报告中提取 ICD-10 码。
生成多个面向患者的信件并从每个信件中提取 ICD-10 码，以便与主 ICD-10 数据库进行核对。
计算综合评分，结合可读性（目标约 6.0 FK）与准确性（ICD-10 码一致性），权重为（0.3，0.7）。
使用 Reflexion AlfWorld 模块进行迭代改进，并选择最优信件以部署到 EHR。
使用相同的原始提示进行零-shot 提示的比较，以评估改进。

实验结果

研究问题

RQ1多代理基于反思的工作流是否相对于零-shot 提示在面向患者的信函中提高了 ICD-10 码的保留率？
RQ2该方法是否在不降低准确性的前提下，将可读性提升至面向患者友好的水平？
RQ3在经过基于反思的处理后，需要无进一步修改的生成信件所占的比例是多少？
RQ4最终信件能否可靠地回推到 EHR 服务器以供患者访问？

主要发现

最终经反思处理的信件在 ICD-10 码准确性方面达到 94.94%，而零-shot 提示为 68.23%。
81.25% 的最终经反思处理的报告在准确性或可读性方面无需修改，而零-shot 输出为 25%。”
在 16 份测试放射科报告中，零-shot 提示需要编辑 11/16 份，而代理型工作流仅需编辑 3/16 份。
经反思处理信件的平均准确性提升 26.71%，可读性提升 3.29%，从而使综合评分高出 17.51%。
可读性平均为 11.03 FK 等级，而面向患者的材料目标约为 6.0 FK；该方法旨在更接近该目标。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。