QUICK REVIEW

[论文解读] RE-TRAC: REcursive TRAjectory Compression for Deep Search Agents

Jialiang Zhu, Gongrui Zhang|arXiv (Cornell University)|Feb 2, 2026

Multimodal Machine Learning Applications被引用 0

一句话总结

RE-TRAC 引入递归轨迹压缩以提升 ReAct 风格的深度搜索代理的跨轨迹反思与全局规划能力，从而改善长期搜索表现；在 BrowseComp 上通过 frontier LLMs 实现 15–20% 的增益，并提供小模型的训练方案。

ABSTRACT

LLM-based deep research agents are largely built on the ReAct framework. This linear design makes it difficult to revisit earlier states, branch into alternative search directions, or maintain global awareness under long contexts, often leading to local optima, redundant exploration, and inefficient search. We propose Re-TRAC, an agentic framework that performs cross-trajectory exploration by generating a structured state representation after each trajectory to summarize evidence, uncertainties, failures, and future plans, and conditioning subsequent trajectories on this state representation. This enables iterative reflection and globally informed planning, reframing research as a progressive process. Empirical results show that Re-TRAC consistently outperforms ReAct by 15-20% on BrowseComp with frontier LLMs. For smaller models, we introduce Re-TRAC-aware supervised fine-tuning, achieving state-of-the-art performance at comparable scales. Notably, Re-TRAC shows a monotonic reduction in tool calls and token usage across rounds, indicating progressively targeted exploration driven by cross-trajectory reflection rather than redundant search.

研究动机与目标

解决线性 ReAct 推理在长期深度研究任务中的局限性（如分支不完整、遗忘、局部最优）。
实现跨轨迹反思与证据、不确定性、失败与未来计划的整合。
提供结构化状态表示以条件化后续轨迹并实现递归全局规划。
在前沿模型上展示 BrowseComp 及相关基准的增益，并给出小模型的训练方案。
展示 RE-TRAC 可作为一种测试时扩展方法，减少多轮中的 token/工具使用量。

提出的方法

在每次 rollout 之后引入轨迹压缩，依据固定的压缩规范 C 生成结构化状态 S_t。
用三个方面来定义 S_t： (i) 答案与结论，(ii) 证据库与验证，(iii) 不确定性与探索轨迹。
递归执行 rollout，每次新 rollout 以前一轮累积的状态 S_t 为条件。
在测试时将 Re-TRAC 作为提示策略使用，无需对模型进行微调；迭代最多 N 轮（默认 8）以产生最终答案。
对于小模型，从 Re-TRAC 轨迹中生成 SFT 数据，以训练能在结构化跨轨迹摘要上进行推理的模型。

实验结果

研究问题

RQ1轨迹压缩是否实现跨轨迹的知识整合并减少长期任务中的分支不完整？
RQ2在保持或提升精度的前提下，RE-TRAC 是否能够提高效率（减少工具调用与 token）并覆盖多轮？
RQ3在通过 Re-TRAC 轨迹（SFT）训练或提示后，小模型是否能够达到或接近最先进水平？
RQ4RE-TRAC 与其他测试时扩展方法（MV、WV、Best-of-N）在 BrowseComp 及相关基准上有何比较？

主要发现

Model	BrowseComp	BrowseComp-ZH	GAIA	XBench	HLE
Claude-4.5-Sonnet	24.1	42.4	71.2	66.0	32
o3	49.7	58.1	70.5	66.7	24.9
OpenAI DeepResearch	51.5	42.9	67.4	-	26.6
GPT-5-high	54.9	63.0	76.7	77.9	42
Gemini-3-pro	37.8	51.6	74.8	-	38.3
Kimi-K2-Thinking-1T	60.2	62.3	-	-	51.0
DeepSeek-V3.2-Thinking-685B	67.6	65.0	-	-	40.8
GLM-4.7-358B	52.0	66.6	-	-	42.8
MiniMax-M2-229B	44.0	48.5	75.7	72.0	31.8
Tongyi-DeepResearch-30B-A3B	43.4	46.7	70.9	75.0	32.9
IterResearch-30B-A3B	37.3	45.2	72.8	-	28.8
WebSailor-V2-30B-A3B (RL)	35.3	44.1	74.1	73.7	30.6
RE-TRAC-30B-A3B (Ours)	53.0	57.3	78.2	83.0	31.5
InfoAgent-14B	15.3	29.2	-	40.4	-
WebExplorer-8B	15.7	32.0	50.0	53.7	17.3
AgentCPM-Explore-4B	25.0	29.0	63.9	70.0	19.1
NestBrowse-4B	22.4	28.4	68.9	74.0	-
RE-TRAC-4B (Ours)	30.0	36.1	70.4	76.6	22.2

RE-TRAC 在 BrowseComp 上对 frontier LLMs 相较于 ReAct 取得绝对增益 15–20%。
一个 30B 的 RE-TRAC-A3B 模型在 BrowseComp 上达到 53% 的准确率，另一个 4B 的 RE-TRAC 模型达到 30%，优于同等规模的若干基线。
RE-TRAC 在各轮中工具调用与 token 使用呈单调下降，表明跨轨迹反思引导的探索更具针对性。
在基于结构化状态表示的 SFT 数据下，小模型达到与同等规模的最先进水平相当的性能（如 RE-TRAC-4B 与 RE-TRAC-30B-A3B）。
RE-TRAC 作为一种无需额外训练的测试时扩展方法，在多模型上展现最佳或具竞争力的结果，相较其他 TTS 方法资源使用更低。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。