QUICK REVIEW

[論文レビュー] MIND: Unified Inquiry and Diagnosis RL with Criteria Grounded Clinical Supports for Psychiatric Consultation

Guoyi Li, Shihao Xu|arXiv (Cornell University)|Mar 4, 2026

Machine Learning in Healthcare被引用数 0

ひとこと要約

MIND は Criteria-Grounded Psychiatric Reasoning Bank (PRB)、ルーブリックベースのプロセス監視、および価値認識型軌道 rectification を用いた精神科診察の統一的な inquiry–diagnosis 強化学習フレームワークを導入し、マルチターン対話における診断精度、思いやりのある対話、解釈可能性を向上させる。

ABSTRACT

Large language models (LLMs) have advanced medical dialogue systems, yet psychiatric consultation poses substantially higher demands due to subjective ambiguity and comorbidity complexity: an agent must continuously extract psychopathological cues from incomplete and inconsistent patient reports in multi-turn interactions and perform rigorous differential diagnostic reasoning. However, existing methods face two fundamental challenges. First, without criteria-grounded clinical supports, they are prone to unsupported clinical assertions when symptoms are atypical or underspecified. Second, in multi-turn interactions, they struggle to mitigate inquiry drift (off-topic or low-yield questioning) and optimize questioning strategies. To address these challenges, we propose MIND, a unified inquiry--diagnosis reinforcement learning framework for psychiatric consultation. Specifically, we build a Criteria-Grounded Psychiatric Reasoning Bank (PRB) that summarizes dialogue context into clinical retrieval states, retrieves semantically similar reference consultations, and distills reusable criteria-grounded clinical supports to guide criteria-aligned inquiry and reasoning. Building on this foundation, MIND enforces explicit clinical reasoning with rubric-based process rewards to provide fine-grained supervision over intermediate decision steps, and incorporates a value-aware trajectory rectification mechanism to jointly improve information acquisition and diagnostic decision-making across turns. Extensive experiments demonstrate that MIND consistently outperforms strong baselines in diagnostic accuracy, empathetic interaction quality, interpretability, and generalization.

研究の動機と目的

ガイドラインおよび文献に基づく基準ベースの支持に根拠を置く推論で unsupported な臨床主張を削減する。
リトリーバルとプロセス監視によって、多ターンの精神科診察における inquiry drift を緩和し、有益な質問を促進する。
明示的な推論 traces と構造化報酬付きの強化学習を通じて、情報収集と診断判断を共同で最適化する。
明示的な推論 traces および臨床的に整合したプロンプトを通じて、解釈可能な AI 支援型精神科診察を可能にする。

提案手法

基準連結型精神科推論 bank (PRB) を構築し、検索状態と基準に整合したサポートを保存する。
検索強化生成を用いて、ターンレベルのヒントとして PRB のサポートを注入し、基準に整合した質問を促す。
ルーブリックベースのプロセス報酬で、症状分析・鑑別/除外・結論ロジックをスコアリングして明示的な臨床推論を強制する。
価値認識型軌道 rectification メカニズムを組み込み、低有用なターンを検出して自己リトライまたは PRB ガイド付きフォールバックを発生させる。
ターンレベルのプロセス信号と最終的診断報酬を組み合わせた RL パイプラインへ、段階的な教師なし微調整を経て訓練する。
精神科カテゴリごとの患者シミュレータで評価し、診断精度、対話品質、サポートの信頼性をベースラインと比較する。

実験結果

リサーチクエスチョン

RQ1Criteria-Grounded PRB に基づく inquiries は、多ターンの精神科診察における診断精度を向上させるか。
RQ2ルーブリックベースのプロセス監視と価値認識の rectification は inquiry drift を低減し、情報取得を改善できるか。
RQ3PRB ガイド付きリトリーバルは、AI 支援型精神科面接における臨床推論の質と信頼性にどのように影響するか。
RQ4MIND は同等の強力なベースラインと比較して、違いのある精神科カテゴリにおける共感、解釈可能性、頑健性でどうなるか。

主な発見

モデル	IC	RC	FC (%)	HL
GLM-4-9B	7.3	7.1	0.0	6.5
HuatuoGPT-o1-7B	8.5	8.2	0.0	7.8
Qwen3-8B	8.9	8.6	0.0	8.1
Qwen3-8B †	8.0	7.9	27.0	8.4
Qwen3-32B	8.?	?	?	?
Qwen3-32B †	8.0	8.1	?	8.0
Baichuan-M2	8.1	8.2	?	?
DDT	54.5	50.7	55.9	?
MRD-RAG	61.5	56.8	55.9	?
Fine-tuned Qwen3-4B †	60.0	54.0	12.0	38.0
Qwen3-8B †	69.2	63.4	66.1	68.0
DoctorAgent-RL	58.5	53.5	55.9	52.0
DDO	59.5	53.0	56.1	46.0
Ours (MIND-4B)	62.0	65.0	56.0	52.0
Ours (MIND-8B)	72.9	70.0	71.4	61.9

MIND は、2 つの患者シミュレータでベースラインより診断精度とカテゴリ別性能が向上した。
PRB ベースのリトリーバルは、基準に整合した意思決定の手掛かりを提供し、見逃しのチェックや不適切な質問を削減する。
ルーブリックベースのプロセス監視は、ターンレベルの推論を臨床チェック（症状分析、鑑別/除外、意思決定ロジック）と整合させる。
価値認識型軌道 rectification は自己リトライと PRB ガイド付きフォールバックを通じて inquiry drift を低減し、安定性と最終診断の信頼性を向上させる。
サポート信頼性評価は、取得サポートと患者文脈への適合性が多くのベースラインよりも高いことを示している。
微調整および RL 最適化バリアントは、複数の評価指標で堅牢な性能を示す。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。