QUICK REVIEW

[论文解读] To Search or Not to Search: Aligning the Decision Boundary of Deep Search Agents via Causal Intervention

Wenlin Zhang, Kuicai Dong|arXiv (Cornell University)|Feb 3, 2026

Explainable Artificial Intelligence (XAI)被引用 0

一句话总结

论文识别深度搜索代理中的决策边界错位（过度搜索与不足搜索），并提出基于因果干预的框架DAS，用于诊断并对齐何时搜索与何时回答，从而提升准确性与效率。

ABSTRACT

Deep search agents, which autonomously iterate through multi-turn web-based reasoning, represent a promising paradigm for complex information-seeking tasks. However, current agents suffer from critical inefficiency: they conduct excessive searches as they cannot accurately judge when to stop searching and start answering. This stems from outcome-centric training that prioritize final results over the search process itself. We identify the root cause as misaligned decision boundaries, the threshold determining when accumulated information suffices to answer. This causes over-search (redundant searching despite sufficient knowledge) and under-search (premature termination yielding incorrect answers). To address these errors, we propose a comprehensive framework comprising two key components. First, we introduce causal intervention-based diagnosis that identifies boundary errors by comparing factual and counterfactual trajectories at each decision point. Second, we develop Decision Boundary Alignment for Deep Search agents (DAS), which constructs preference datasets from causal feedback and aligns policies via preference optimization. Experiments on public datasets demonstrate that decision boundary errors are pervasive across state-of-the-art agents. Our DAS method effectively calibrates these boundaries, mitigating both over-search and under-search to achieve substantial gains in accuracy and efficiency. Our code and data are publicly available at: https://github.com/Applied-Machine-Learning-Lab/WWW2026_DAS.

研究动机与目标

形式化定义深度搜索代理中的决策边界以及两种错误模式（过度搜索与不足搜索）。
使用因果干预（do-算子）对照事实轨迹与反事实轨迹来诊断决策边界错误。
提出决策边界对齐（DAS），通过偏好优化从因果反馈中学习。
在多个QA数据集和模型规模上证明DAS提升准确性与效率。

提出的方法

以潜在知识状态（充分/不足）和行动（搜索/回答）来形式化建模决策边界。
使用因果干预（do-算子）生成反事实轨迹并诊断决策是否最优。
从因果反馈构造偏好数据集，将被偏好的反事实与被拒绝的事实轨迹配对。
应用直接偏好优化（DPO）通过构造的偏好来微调策略。
使用来自NQ和HotpotQA的20,000个偏好示例数据集进行训练；进行3轮DAS训练，采用LoRA微调。
使用EM、总推理时间、ASQ、OSR、USR在NQ、HotpotQA和2WikiMultiHopQA上进行评估。

实验结果

研究问题

RQ1RQ1：在最先进的深度搜索代理中是否存在决策边界错误（OSR/USR）？
RQ2RQ2：任务特征如何影响决策边界错误？
RQ3RQ3：决策边界对齐（DAS）是否能降低OSR/USR并提升准确性与效率？
RQ4RQ4：代理的知识边界与决策边界之间有什么关系？
RQ5RQ5：推理步数如何影响决策边界错误的盛行程度？

主要发现

决策边界错误（OSR和USR）在模型和工作流中普遍存在。
基于结果的强化学习可以提升准确性，但往往增加搜索成本，揭示了准确性与效率之间的权衡。
DAS在QA数据集和模型规模上始终提升EM并同时降低OSR和USR。
消融实验表明同时平衡过度搜索和不足搜索信号对于实现最优性能是必要的。
存在知识-决策差距，表明代理在何时停止搜索并依赖内部知识方面的自我评估能力较弱。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。