QUICK REVIEW

[논문 리뷰] Roleplay-doh: Enabling Domain-Experts to Create LLM-simulated Patients via Eliciting and Adhering to Principles

Ryan Louie, Ananjan Nandi|arXiv (Cornell University)|2024. 07. 01.

Simulation-Based Education in Healthcare인용 수 5

한 줄 요약

Roleplay-doh는 도메인 전문가가 정성적 피드백을 자연어 원칙으로 이끌어내고 원칙-준수 파이프라인을 적용하여 상담 롤플레이의 현실성 및 일관성을 향상시키는 LLM-시뮬레이션 환자 생성을 가능하게 합니다.

ABSTRACT

Recent works leverage LLMs to roleplay realistic social scenarios, aiding novices in practicing their social skills. However, simulating sensitive interactions, such as in mental health, is challenging. Privacy concerns restrict data access, and collecting expert feedback, although vital, is laborious. To address this, we develop Roleplay-doh, a novel human-LLM collaboration pipeline that elicits qualitative feedback from a domain-expert, which is transformed into a set of principles, or natural language rules, that govern an LLM-prompted roleplay. We apply this pipeline to enable senior mental health supporters to create customized AI patients for simulated practice partners for novice counselors. After uncovering issues in GPT-4 simulations not adhering to expert-defined principles, we also introduce a novel principle-adherence prompting pipeline which shows 30% improvements in response quality and principle following for the downstream task. Via a user study with 25 counseling experts, we demonstrate that the pipeline makes it easy and effective to create AI patients that more faithfully resemble real patients, as judged by creators and third-party counselors. See our project website at https://roleplay-doh.github.io/ for code and data.

연구 동기 및 목표

도메인 전문가가 기술적 프롬프트 작성 전문 지식 없이도 교육용 신입 상담가를 위한 LLM-시뮬레이션 환자를 만들 수 있게 한다.
전문가의 정성적 피드백을 자연어 원칙으로 변환하여 LLM 프롬프트 롤플레이를 지배하게 한다.
다중 원칙 프롬프트에서 원칙 준수 문제를 다뤄 반응 품질과 일관성을 향상한다.
상담 전문가를 대상으로 접근 방식을 평가하여 현실성 및 교육 유용성을 평가한다.

제안 방법

전문가가 AI-환자 응답에 대해 자연어로 비판/칭찬/재작성 하는 인터랙티브 도구 Roleplay-doh.
LLM은 전문가 피드백을 행동 지침으로서의 원칙 집합으로 변환한다.
원칙-준수 파이프라인은 다중 파트 원칙을 예/아니오 질문으로 분해한다(Principle-as-Questions Rewriter).
Automatic Principle Generator는 대화의 일관성과 조화에 대한 맥락 관련 평가 기준을 추가한다.
Applicability and Adherence Evaluator는 현재 맥락에 적용 가능한 원칙을 확인하고 그에 따라 응답을 다듬는다.

실험 결과

연구 질문

RQ1전문가의 정성적 피드백을 LLM-시뮬레이션 환자를 지배하는 실행 가능한 원칙으로 효율적으로 변환할 수 있는가?
RQ2원칙 준수 파이프라인이 AI 환자 롤플레이에서 전문가 원칙 준수와 대화 품질을 향상시키는가?
RQ3Roleplay-doh로 만들어진 AI 환자가 실제 환자와 더 비슷하고 신입 상담가를 위한 효과적인 훈련 파트너로 작용하는가?

주요 결과

25 counseling experts participated in a within-subjects study comparing scenario-only vs scenario+principles AI patients.
Experts rated scenario+principles AI patients higher on authenticity, resemblance to past cases, and training-readiness than scenario-only.
Third-party counselors also rated scenario+principles AI patients higher on several dimensions, supporting external validity.
The principle-adherence pipeline improved principle following (win 35%, loss 5%) and dialogue consistency (win 35%, loss 10%) over ablations.
Across 40 test-case evaluations, the Full pipeline outperformed baselines on consistency (M1) and principle adherence (M3).
The approach reduced awkwardness in responses (M2) and highlighted the importance of both the Yes/No rewriter and automatic criterion generation.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.