QUICK REVIEW

[論文レビュー] Structure Matters: Evaluating Multi-Agents Orchestration in Generative Therapeutic Chatbots

Sina Elahimanesh, Mohammadali Mohammadkhani|arXiv (Cornell University)|Feb 28, 2026

Digital Mental Health Interventions被引用数 0

ひとこと要約

この研究は、3つのLLMベースの治療チャットボットアーキテクチャ（長期記憶を備えた多エージェントFSM、SAT知識を備えた単一エージェント、ガイドなしGPT-4o）を比較し、ファルシSATベースの治療文脈で多エージェントFSM設計が他の選択肢より自然で人間らしい対話と相互作用の質を著しく高めることを示しています。

ABSTRACT

While large language models (LLMs) excel at open-ended dialogue, effective psychotherapy requires structured progression and adherence to clinical protocols, making the design of psychotherapist chatbots challenging. We investigate how different LLM-based designs shape perceived therapeutic dialogue in a chatbot grounded in the Self-Attachment Technique (SAT), a novel self-administered psychotherapy rooted in attachment theory. We compare three architectural variants: (1) a multi-agent system utilizing finite state machine aligned with therapeutic stages and a shared long-term memory, (2) a single-agent using identical knowledge-base and the same prompts, and (3) an unguided LLM. In an eight-day randomized controlled trial (RCT) with N=66 Farsi-speaking participants, balanced across the three chatbots, the multi-agent system is perceived as significantly more natural and human-like than the other variants and achieves higher ratings across most other metrics. These findings demonstrate that for therapeutic AI, architectural orchestration is as critical as prompt engineering in fostering natural, engaging dialogue.

研究の動機と目的

LLMベースの治療チャットボットのアーキテクチャ設計が治療品質の知覚に与える影響を評価する。
3つのアーキテクチャ（長期記憶を伴う多エージェントFSM；SAT知識を持つ単一エージェント；ガングリッドなしのLLM）を統制条件下で比較する。
自然さ・信頼・共感・記憶・満足・会話の焦点への影響を検討する。
アーキテクラ構造が対話ダイナミクスとエンゲージメントをどのように形作るかの機序を調査する。

提案手法

Alpha（長期記憶を持つ多エージェントFSM）・Beta（SATコンテンツを備えた単一エージェント）・Gamma（ガイドなしの単一エージェント）に割り付けられたN=66名の参加者を対象とする三条件間の被験者間RCT。
すべての条件でGPT-4oを基盤モデルとして使用。プロンプトとインターフェースは同一で、英語プロンプトと英語デザイン、ただしファルシ deployment。
Alphaは共有長期記憶を備えた12状態SAT対応FSMと個別化エクササイズのための適応的Retrieval-Augmented Generation（RAG）を使用。
Betaは同一のSATコンテンツとエクササイズを使用するが、明示的なFSM強制を伴わない単一プロンプトに依存。
GammaはSAT知識や構造化された目標を持たない最小限のLLM設定を提供。
日次1日目から8日目までの進行を追跡するカレンダーベースのモデルと長期記憶要約を生成。

Figure 1. Overview of the user study comprising three phases: (1) recruitment and blinded RCT group assignment; (2) an eight-day study period during which participants interacted with one of three therapeutic chatbot versions, multi-agent FSM-based, single-agent with therapy knowledge, or unguided s

実験結果

リサーチクエスチョン

RQ1アーキテクチャ的オーケストレーション（長期記憶を備えた多エージェントFSM）は、単一エージェントのSAT対応システムおよびガイドなしLLMと比較して、知覚上の自然さを改善するか。
RQ2異なるアーキテクチャ下で現れる特定の対話ダイナミクス（ターン交代、発話長、エージェント/ユーザーのメッセージ比）は何か。
RQ3アーキテクチャの違いはSAT情報を用いたチャットボットの信頼・共感・記憶整合性・満足度にどの程度影響するか。
RQ48日間の試行を通じて、異なるシステムが治療的進行と記憶維持にどのように適合するか。

主な発見

AlphaはBetaとGammaよりも自然で人間らしいと有意に高く評価された（平均3.955、SD0.950対比3.043、SD0.825および3.211、SD0.787）。
統計検定はF=7.017、p_perm=0.0018、η^2=0.187を示し、アーキテクチャ設計が評価分散の約19%を説明。
Alphaはより多くだが短いメッセージを生成（総計459件、約230文字）で、Beta（336件、約409文字）およびGamma（206件、約635文字）より長短の差。
Alphaの参加者は平均的に短いユーザーメッセージを送信（29.0文字）し、Beta（38.9）、Gamma（42.8）より短い。
Alphaの対話ダイナミクスは、エージェント対ユーザーのメッセージ比が低く（7.9:1）、Beta（10.5:1）およびGamma（13.4:1）より少なかった。
表1はAlphaがほとんどの相互作用指標でBetaとGammaを凌駕し、特に自然さで優れていることを示す；使いやすさの指標は条件間でほぼ同等。

Figure 2. Screenshot of the web-based user interface of the chatbot. After logging in, users are directed to the home screen where they can start interacting with the chatbot. (A) shows the list of user messages and corresponding chatbot responses. (B) is the input area for composing and sending mes

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。