QUICK REVIEW

[論文レビュー] Multi-dimensional Assessment and Explainable Feedback for Counselor Responses to Client Resistance in Text-based Counseling with LLMs

Anqi Li, Ruihan Wang|arXiv (Cornell University)|Feb 25, 2026

Mental Health via Writing被引用数 0

ひとこと要約

要約：本文は、テキストベースのカウンセリングにおけるクライアントの抵抗に対するカウンセラーの応答を評価する四次元フレームワークを提案し、専門家が注釈したデータセットを作成し、ファインチューニング済みのLlama-3.1モデルを訓練してベースラインを上回り、説明を生成できることを示す。概念実証研究ではAI生成のフィードバックがカウンセラーの応答品質を向上させることを示した。

ABSTRACT

Effectively addressing client resistance is a sophisticated clinical skill in psychological counseling, yet practitioners often lack timely and scalable supervisory feedback to refine their approaches. Although current NLP research has examined overall counseling quality and general therapeutic skills, it fails to provide granular evaluations of high-stakes moments where clients exhibit resistance. In this work, we present a comprehensive pipeline for the multi-dimensional evaluation of human counselors' interventions specifically targeting client resistance in text-based therapy. We introduce a theory-driven framework that decomposes counselor responses into four distinct communication mechanisms. Leveraging this framework, we curate and share an expert-annotated dataset of real-world counseling excerpts, pairing counselor-client interactions with professional ratings and explanatory rationales. Using this data, we perform full-parameter instruction tuning on a Llama-3.1-8B-Instruct backbone to model fine-grained evaluative judgments of response quality and generate explanations underlying. Experimental results show that our approach can effectively distinguish the quality of different communication mechanisms (77-81% F1), substantially outperforming GPT-4o and Claude-3.5-Sonnet (45-59% F1). Moreover, the model produces high-quality explanations that closely align with expert references and receive near-ceiling ratings from human experts (2.8-2.9/3.0). A controlled experiment with 43 counselors further confirms that receiving these AI-generated feedback significantly improves counselors' ability to respond effectively to client resistance.

研究の動機と目的

理論に基づく多次元フレームワークを構築し、テキストベースのカウンセリングにおけるクライアント抵抗に対するカウンセラーの応答を評価する。
抵抗応答介入の説明付き専門家注釈データセットを作成する。
大規模言語モデルを訓練し、細かな評価と解釈可能な説明を提供できるようにする。
AI生成フィードバックの実務的有用性を実証し、抵抗コンテキストでのカウンセラーのパフォーマンスを向上させる。

提案手法

四次元フレームワークを提案する：自律性への尊重、態度整合、感情的共鳴、会話志向、それぞれ三つの表現レベル（なし、弱い、強い）。
抵抗検出とカウンセラーの応答を含むClientBehaviorおよびObserverWAI対話から専門家注釈データセットを作成し、説明を含める。
5分割交差検証とクラス不均衡に対処するオーバーサンプリングを用いて、Llama-3.1-8B-Instructをタスクに対して全パラメータファインチューニングを実施。
マクロF1と精度を用いてベースライン（GPT-4oやClaude-3.5-Sonnetを含む）との分類性能を評価；説明品質は自動指標（BLEU/ Rouge）と人間評価で評価。
43名のカウンセラーを対象に、線形混合効果モデルを用いてAI生成フィードバックの有効性を検証する概念実証研究を実施。

Figure 1: Overview of our framework for evaluating counselor responses to client resistance. The framework comprises four core communication mechanisms: Respect for Autonomy , Stance Alignment , Emotional Resonance , and Conversational Orientation . For each mechanism, responses are further categori

実験結果

リサーチクエスチョン

RQ1抵抗文脈における四つの通信機構で、カウンセラーの応答表現レベルを多次元的フレームワークが信頼性高く区別できるか？
RQ2タスク特異的な説明付きファインチューニングが、ラベルのみの訓練を超えて分類と説明品質を改善するか？
RQ3AI生成の説明とフィードバックはリアルタイムのカウンセラー訓練とスキル開発に有用で恩恵をもたらすか？

主な発見

モデル名	自律性への尊重 F1	自律性への尊重実行/ACC	態度整合 F1	態度整合 ACC	感情的共鳴 F1	感情的共鳴 ACC	会話志向 F1	会話志向 ACC
Our Model	80.92 ± 1.55	87.06 ± 0.85	77.56 ± 1.78	84.06 ± 1.37	77.34 ± 3.67	78.68 ± 3.32	77.87 ± 0.54	77.64 ± 0.56
Explanations	73.24 ± 1.92	83.38 ± 0.57	70.17 ± 1.30	80.74 ± 0.42	73.23 ± 2.67	75.52 ± 3.21	73.21 ± 1.48	74.15 ± 1.73

我々のモデルは四つの機構に対してマクロF1 77.34–81.00%、精度 77.64–87.06%を達成し、GPT-4oおよびClaude-3.5-Sonnetより20ポイント以上のF1を上回る。
訓練時に説明を組み込むと、ラベルのみの訓練より少なくとも約4F1ポイントの向上が得られる。
自動説明はBLEU-1 = 0.60、枠組みの一貫性、証拠のアンカー、明確さ/特異性について人間評価で2.8–2.9/3.0の品質スコアを達成。
説明は強い語彙整合性を示し（BLEU-1 0.60）、人間評価でほぼ天井に近い高品質で実用的なフィードバックを示す。
統制実験では、AI生成フィードバックを受けたカウンセラーの抵抗応答品質が対照群と比較して有意に改善（4つの次元全体で phase interaction 効果）。
注釈の信頼性は四機構で Cohen’s κ = 0.74–0.77 の実質的水準であり、高品質な説明的推論を伴う。

Figure 2: Interaction effects between experimental groups and phases across four dimensions. Solid green lines represent the control group, while dashed orange lines represent the experimental group. Points denote the mean values, and error bars indicate 95% confidence intervals. The results reveal

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。