[论文解读] To Retrieve or To Think? An Agentic Approach for Context Evolution
ACE 引入一个多智能体框架,动态在外部检索与内部推理之间轮换以进化上下文,在多跳问答上实现更高准确性,同时相比迭代基线减少令牌使用。
Current context augmentation methods, such as retrieval-augmented generation, are essential for solving knowledge-intensive reasoning tasks. However, they typically adhere to a rigid, brute-force strategy that executes retrieval at every step. This indiscriminate approach not only incurs unnecessary computational costs but also degrades performance by saturating the context with irrelevant noise. To address these limitations, we introduce Agentic Context Evolution (ACE), a framework inspired by human metacognition that dynamically determines whether to seek new evidence or reason with existing knowledge. ACE employs a central orchestrator agent to make decisions strategically via majority voting. It aims to alternate between activating a retriever agent for external retrieval and a reasoner agent for internal analysis and refinement. By eliminating redundant retrieval steps, ACE maintains a concise and evolved context. Extensive experiments on challenging multi-hop QA benchmarks demonstrate that ACE significantly outperforms competitive baselines in accuracy while achieving efficient token consumption. Our work provides valuable insights into advancing context-evolved generation for complex, knowledge-intensive tasks.
研究动机与目标
- Motivate context augmentation beyond brute-force retrieval for knowledge-intensive tasks.
- Propose ACE to dynamically balance external retrieval and internal reasoning through a central orchestrator.
- Show that ACE improves accuracy on multi-hop QA benchmarks while reducing token consumption compared with baselines.
提出的方法
- Model an interleaved retrieve-think cycle controlled by a central orchestrator that uses majority voting.
- Use a multi-agent committee where each agent decides between RETRIEVE or THINK at each round.
- Retrieve adds new external context to memory; Think generates a sub-query and internal answer to refine memory.
- After N rounds, synthesize final answer from the evolved context via a dedicated generation function.
- Evaluate on multi-hop QA datasets with accuracy and average token consumption as primary metrics.

实验结果
研究问题
- RQ1Can agentic control of when to retrieve vs. think improve performance over static RAG pipelines?
- RQ2What is the optimal iteration depth for ACE across different datasets?
- RQ3How does ACE balance retrieval cost with internal reasoning to reduce token usage while maintaining accuracy?
- RQ4How does the Think proportion (REASON actions) evolve with more rounds across datasets?
- RQ5Does ACE maintain or improve robustness against noise vs. traditional iterative retrieval methods?
主要发现
- ACE achieves state-of-the-art accuracy on three multi-hop QA benchmarks (MultiHop-RAG, HotpotQA, 2WikiQA).
- ACE reduces token consumption compared with brute-force iterative baselines (e.g., 10,653 vs. 18,196 on MultiHop-RAG).
- Increasing iteration depth yields higher accuracy up to dataset-specific optima (e.g., N=5 for MultiHop-RAG; N=3 for HotpotQA and 2WikiQA).
- The Think (internal reasoning) action proportion increases with more rounds, indicating dynamic prioritization of reasoning over retrieval.
- Single-step ACE aligns with standard RAG performance, confirming the necessity of multiple rounds for gains.
- ACE demonstrates that dynamic, metacognitive context evolution can outperform static retrieval-augmented baselines in both accuracy and efficiency.
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。