[论文解读] MIND: Unified Inquiry and Diagnosis RL with Criteria Grounded Clinical Supports for Psychiatric Consultation
MIND 引入一个统一的探询–诊断强化学习框架用于精神科咨询,采用 Criteria-Grounded Psychiatric Reasoning Bank(PRB)、基于评分标准的过程监督,以及价值感知轨迹纠偏,以在多轮对话中提升诊断准确性、同理互动与可解释性。
Large language models (LLMs) have advanced medical dialogue systems, yet psychiatric consultation poses substantially higher demands due to subjective ambiguity and comorbidity complexity: an agent must continuously extract psychopathological cues from incomplete and inconsistent patient reports in multi-turn interactions and perform rigorous differential diagnostic reasoning. However, existing methods face two fundamental challenges. First, without criteria-grounded clinical supports, they are prone to unsupported clinical assertions when symptoms are atypical or underspecified. Second, in multi-turn interactions, they struggle to mitigate inquiry drift (off-topic or low-yield questioning) and optimize questioning strategies. To address these challenges, we propose MIND, a unified inquiry--diagnosis reinforcement learning framework for psychiatric consultation. Specifically, we build a Criteria-Grounded Psychiatric Reasoning Bank (PRB) that summarizes dialogue context into clinical retrieval states, retrieves semantically similar reference consultations, and distills reusable criteria-grounded clinical supports to guide criteria-aligned inquiry and reasoning. Building on this foundation, MIND enforces explicit clinical reasoning with rubric-based process rewards to provide fine-grained supervision over intermediate decision steps, and incorporates a value-aware trajectory rectification mechanism to jointly improve information acquisition and diagnostic decision-making across turns. Extensive experiments demonstrate that MIND consistently outperforms strong baselines in diagnostic accuracy, empathetic interaction quality, interpretability, and generalization.
研究动机与目标
- 通过以指南和文献中的准则为基础的证据支撑将推理 grounding,减少不支持的临床陈述。
- 在多轮精神科咨询中通过检索与过程监督引导信息性提问,降低探询漂移。
- 通过明确的推理轨迹和结构化奖励,联合优化信息收集与诊断决策的强化学习。
- 通过明确的推理轨迹和临床对齐的提示,使AI 辅助的精神科咨询具备可解释性。
提出的方法
- 构建一个 Criteria-Grounded Psychiatric Reasoning Bank (PRB),存储检索状态和准则对齐的支持。
- 使用检索增强生成将 PRB 支持作为轮次级提示注入,以实现准则对齐的提问。
- 通过基于评分标准的过程奖励来强制明确的临床推理,包括对症状分析、鉴别考虑和决策逻辑的评分。
- 引入价值感知的轨迹纠偏机制,用于发现低效轮次并触发自我重试或 PRB 指导的后备处理。
- 通过分阶段的监督微调过渡到 RL 流水线,将轮次级过程信号与终端诊断奖励结合。
- 在精神科类别的患者模拟器上进行评估,并与基线在诊断准确性、互动质量与支持可信度等方面进行比较。
实验结果
研究问题
- RQ1将探询 grounding 于 Criteria-Grounded PRB 是否能提升多轮精神科咨询的诊断准确性?
- RQ2基于评分标准的过程监督与价值感知的纠偏是否能减少探询漂移并改善信息获取?
- RQ3PRB 指引的检索如何影响AI辅助精神科面谈中的临床推理质量与可靠性?
- RQ4MIND 相较于强基线在同情心、可解释性和鲁棒性方面在不同精神科类别上有何差异?
主要发现
| Model | IC | RC | FC (%) | HL |
|---|---|---|---|---|
| GLM-4-9B | 7.3 | 7.1 | 0.0 | 6.5 |
| HuatuoGPT-o1-7B | 8.5 | 8.2 | 0.0 | 7.8 |
| Qwen3-8B | 8.9 | 8.6 | 0.0 | 8.1 |
| Qwen3-8B † | 8.0 | 7.9 | 27.0 | 8.4 |
| Qwen3-32B | 8.? | ? | ? | ? |
| Qwen3-32B † | 8.0 | 8.1 | ? | 8.0 |
| Baichuan-M2 | 8.1 | 8.2 | ? | ? |
| DDT | 54.5 | 50.7 | 55.9 | ? |
| MRD-RAG | 61.5 | 56.8 | 55.9 | ? |
| Fine-tuned Qwen3-4B † | 60.0 | 54.0 | 12.0 | 38.0 |
| Qwen3-8B † | 69.2 | 63.4 | 66.1 | 68.0 |
| DoctorAgent-RL | 58.5 | 53.5 | 55.9 | 52.0 |
| DDO | 59.5 | 53.0 | 56.1 | 46.0 |
| Ours (MIND-4B) | 62.0 | 65.0 | 56.0 | 52.0 |
| Ours (MIND-8B) | 72.9 | 70.0 | 71.4 | 61.9 |
- MIND 在两种患者模拟器上比基线在诊断准确性与类别分布表现更强。
- 基于 PRB 的检索提供了准则对齐的决策线索,减少了漏检和无关问题。
- 基于评分标准的过程监督使轮次级推理与临床检查(症状分析、鉴别/排除、决策逻辑)对齐。
- 价值感知的轨迹纠偏通过自我重试和 PRB 指导的回退,降低了探询漂移,提升稳定性与最终诊断可靠性。
- 支持可信度评估显示 MIND 在 retrieved supports 与患者情境对齐方面优于许多基线模型。
- 经过微调与 RL 优化的 MIND 在多个评估指标上表现出鲁棒性。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。