QUICK REVIEW

[논문 리뷰] SEAD: Self-Evolving Agent for Multi-Turn Service Dialogue

Yuqin Dai, Ning Gao|arXiv (Cornell University)|2026. 02. 03.

Speech and dialogue systems인용 수 0

한 줄 요약

SEAD는 대규모 주석 데이터 없이도 프로필 컨트롤러와 User Role-Play Model로 사용자 모델링을 분리하고, 적응형 커리큘럼을 통해 서비스 에이전트를 학습시키는 제로 애노테이션(self-evolving) 프레임워크다.

ABSTRACT

Large Language Models have demonstrated remarkable capabilities in open-domain dialogues. However, current methods exhibit suboptimal performance in service dialogues, as they rely on noisy, low-quality human conversation data. This limitation arises from data scarcity and the difficulty of simulating authentic, goal-oriented user behaviors. To address these issues, we propose SEAD (Self-Evolving Agent for Service Dialogue), a framework that enables agents to learn effective strategies without large-scale human annotations. SEAD decouples user modeling into two components: a Profile Controller that generates diverse user states to manage training curriculum, and a User Role-play Model that focuses on realistic role-playing. This design ensures the environment provides adaptive training scenarios rather than acting as an unfair adversary. Experiments demonstrate that SEAD significantly outperforms Open-source Foundation Models and Closed-source Commercial Models, improving task completion rate by 17.6% and dialogue efficiency by 11.1%. Code is available at: https://github.com/Da1yuqin/SEAD.

연구 동기 및 목표

데이터 부족과 목표 지향 서비스 대화에서의 저품질 인간 데이터 문제를 해결한다.
다양한 프로필 생성기와 현실적인 롤플레이 모델로 사용자 모델링을 분리하여 공정한 적대적 학습을 가능하게 한다.
Mistake Analysis를 통해 학습 난이도를 약 50% 수준으로 유지하는 적응형 커리큘럼을 만든다.
주석이 거의 없고 모델 크기가 더 작은 상태에서도 SEAD가 오픈소스 및 상용 모델을 능가할 수 있음을 입증한다.

제안 방법

다양한 초기 사용자 상태를 샘플링하고 적대적 학습에 참여하는 Profile Generator를 도입한다.
결과를 제어하지 않고 사용자 응답을 시뮬레이션하는 User Role-Play Model를 도입한다.
에이전트 상태 추정을 통해 행동을 안내하는 순차적 의사결정 프로세스로 다중 턴 대화를 모델링한다.
경로 기반 이점을 사용하여 서비스 에이전트를 업데이트하기 위해 Group Relative Policy Optimization(GRPO)을 사용한다.
Mistake Analysis를 통해 난이도를 조정하고 향후 프로필 샘플링을 닫힌 루프에서 안내한다.
익명화된 실제 행동 패턴을 활용하여 다양하고 진정한 사용자 프로필을 구축한다.

실험 결과

연구 질문

RQ1SEAD가 대규모 주석 데이터 없이도 다중 턴 서비스 대화에서 높은 작업 완성도를 달성할 수 있는가?
RQ2사용자 모델링을 프로필 컨트롤러와 롤-플레이 모델로 분해하는 것이 공정한 적대적 학습과 효과적인 커리큘럼 설계를 가능하게 하는가?
RQ3Mistake Analysis를 통한 적응 난이도가 다양한 사용자 프로필에서 에이전트 학습 및 성능을 향상시키는가?

주요 결과

SEAD는 14B 모델로 52.0% 완료율을 달성하여 GPT-4o(44.2%)를 능가한다.
SEAD는 Average Turns to Target를 9.6턴으로 줄인다.
SEAD는 경쟁력 있는 사용자 프로필 정확도(0.912)와 강한 상태 개선(EI 0.63, TI 1.57, CI 1.55)을 제공합니다.
절삭 실험은 세 구성요소(프로필 샘플링, Mistake Analysis, URM 분리)가 최적 성능에 필수적임을 보여준다.
SEAD는 비주석 대화 데이터 없이도 훨씬 더 작은 모델 크기와 더 높은 작업 효율성 및 현실성을 보여준다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.