QUICK REVIEW

[论文解读] Roleplay-doh: Enabling Domain-Experts to Create LLM-simulated Patients via Eliciting and Adhering to Principles

Ryan Louie, Ananjan Nandi|arXiv (Cornell University)|Jul 1, 2024

Simulation-Based Education in Healthcare被引用 5

一句话总结

tldr: Roleplay-doh 使领域专家通过将定性反馈转化为自然语言原则并应用原则遵循流程，来创建 LLM 模拟患者，从而在辅导角色扮演中提升现实性与一致性。

ABSTRACT

Recent works leverage LLMs to roleplay realistic social scenarios, aiding novices in practicing their social skills. However, simulating sensitive interactions, such as in mental health, is challenging. Privacy concerns restrict data access, and collecting expert feedback, although vital, is laborious. To address this, we develop Roleplay-doh, a novel human-LLM collaboration pipeline that elicits qualitative feedback from a domain-expert, which is transformed into a set of principles, or natural language rules, that govern an LLM-prompted roleplay. We apply this pipeline to enable senior mental health supporters to create customized AI patients for simulated practice partners for novice counselors. After uncovering issues in GPT-4 simulations not adhering to expert-defined principles, we also introduce a novel principle-adherence prompting pipeline which shows 30% improvements in response quality and principle following for the downstream task. Via a user study with 25 counseling experts, we demonstrate that the pipeline makes it easy and effective to create AI patients that more faithfully resemble real patients, as judged by creators and third-party counselors. See our project website at https://roleplay-doh.github.io/ for code and data.

研究动机与目标

使领域专家能够在不需要技术性 prompt 编写专长的前提下，为培训初级辅导员创建 LLM-模拟患者。
将专家的定性反馈转换为规范 LLM 提示角色扮演的自然语言原则。
解决多原则提示中的原则遵循挑战，以提高回答质量和一致性。
与辅导专家一起评估该方法，以评估现实性和培训有效性。

提出的方法

交互式工具 Roleplay-doh：专家用自然语言对 AI 患者的回答进行评审（认可/批评/改写）。
LLM 将专家反馈转化为一组指导行为的宪法性原则。
原则遵循管线将多原则分解为是/否问题（Principle-as-Questions Rewriter）。
自动原则生成器添加与情境相关的对话连贯性与一致性评估标准。
适用性与遵循评估器检查哪些原则适用于当前情境，并据此细化回答。

实验结果

研究问题

RQ1专家定性反馈是否能够高效转化为可执行的、支配 LLM-模拟患者的原则？
RQ2原则遵循管线是否能够提高对专家原则的遵循以及 AI 患者角色扮演中的对话质量？
RQ3使用 Roleplay-doh 创建的 AI 患者是否更像真实患者，并成为初级辅导员的有效培训伙伴？

主要发现

25 位辅导专家参与了一个被试内研究，比较仅情景的 AI 患者与情景+原则的 AI 患者。
专家在真实性、与过往案例的相似性以及培训就绪度方面对情景+原则 AI 患者的评价高于仅情景的AI患者。
第三方咨询师在若干维度上也对情景+原则 AI 患者给出更高评价，支持外部效度。
原则遵循管线在与消融对比下改善了原则遵循（win 35%，loss 5%）和对话一致性（win 35%，loss 10%）。
在 40 个测试用例评估中，整套管线在一致性（M1）和原则遵循（M3）上优于基线。
该方法降低了回答中的尴尬感（M2），并强调了 Yes/No 改写器和自动标准生成的重要性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。