QUICK REVIEW

[Paper Review] RE-TRAC: REcursive TRAjectory Compression for Deep Search Agents

Jialiang Zhu, Gongrui Zhang|arXiv (Cornell University)|Feb 2, 2026

Multimodal Machine Learning Applications0 citations

TL;DR

RE-TRAC introduces recursive trajectory compression to ReAct-style deep search agents, enabling cross-trajectory reflection and globally informed planning to improve long-horizon search performance; it achieves 15–20% gains on BrowseComp with frontier LLMs and offers a training recipe for small models.

ABSTRACT

LLM-based deep research agents are largely built on the ReAct framework. This linear design makes it difficult to revisit earlier states, branch into alternative search directions, or maintain global awareness under long contexts, often leading to local optima, redundant exploration, and inefficient search. We propose Re-TRAC, an agentic framework that performs cross-trajectory exploration by generating a structured state representation after each trajectory to summarize evidence, uncertainties, failures, and future plans, and conditioning subsequent trajectories on this state representation. This enables iterative reflection and globally informed planning, reframing research as a progressive process. Empirical results show that Re-TRAC consistently outperforms ReAct by 15-20% on BrowseComp with frontier LLMs. For smaller models, we introduce Re-TRAC-aware supervised fine-tuning, achieving state-of-the-art performance at comparable scales. Notably, Re-TRAC shows a monotonic reduction in tool calls and token usage across rounds, indicating progressively targeted exploration driven by cross-trajectory reflection rather than redundant search.

Motivation & Objective

Address limitations of linear ReAct reasoning in long-horizon deep research tasks (e.g., incomplete branches, forgetting, local optima).
Enable cross-trajectory reflection and consolidation of evidence, uncertainties, failures, and future plans.
Provide a structured state representation to condition subsequent trajectories and enable recursive, global planning.
Demonstrate gains on BrowseComp and related benchmarks using frontier models, and show a training recipe for smaller models.
Show that Re-TRAC can serve as a test-time scaling method that reduces token/tool usage over rounds.

Proposed method

Introduce trajectory compression after each rollout to create a structured state S_t via a fixed compression specification C.
Define S_t by three facets: (i) Answer & Conclusions, (ii) Evidence Base & Verification, (iii) Uncertainties & Exploration Trace.
Recursively execute rollouts where each new rollout conditions on the accumulated state S_t from prior rounds.
Apply Re-TRAC as a prompting strategy at test time without model fine-tuning; iterate up to N rounds (default 8) to produce final answers.
For small models, generate SFT data from Re-TRAC trajectories to train models that ground reasoning on structured cross-trajectory summaries.

Experimental results

Research questions

RQ1Does trajectory compression enable cross-trajectory knowledge consolidation and reduce incomplete branches in long-horizon tasks?
RQ2Can Re-TRAC improve efficiency (fewer tool calls and tokens) while maintaining or improving accuracy across rounds?
RQ3Can smaller models achieve state-of-the-art or competitive results when trained or prompted with Re-TRAC trajectories (SFT)?
RQ4How does Re-TRAC compare to other test-time scaling methods (MV, WV, Best-of-N) onBrowseComp and related benchmarks?

Key findings

Re-TRAC achieves absolute gains of 15–20% over ReAct on BrowseComp with frontier LLMs.
A 30B RE-TRAC-A3B model attains 53% accuracy on BrowseComp, and a 4B RE-TRAC model attains 30%, outperforming several baselines of similar size.
Re-TRAC shows monotonic reductions in tool calls and token usage across rounds, indicating more targeted exploration guided by cross-trajectory reflection.
With SFT data grounded on structured state representations, small models reach state-of-the-art performance at comparable scales (e.g., RE-TRAC-4B and RE-TRAC-30B-A3B).
RE-TRAC as a training-free test-time scaling method yields best or competitive results across multiple models, with reduced resource usage compared to other TTS methods.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.