Skip to main content
QUICK REVIEW

[論文レビュー] InfiAgent: An Infinite-Horizon Framework for General-Purpose Autonomous Agents

Chenglin Yu, Yuchen Wang|arXiv (Cornell University)|Jan 6, 2026
Explainable Artificial Intelligence (XAI)被引用数 0
ひとこと要約

InfiAgent externalizes persistent task state via a file-centric workspace and reconstructs a bounded reasoning context at each step, enabling long-horizon autonomy with a 20B open-source model that rivals larger proprietary agents on research tasks and maintains high long-horizon coverage.

ABSTRACT

LLM agents can reason and use tools, but they often break down on long-horizon tasks due to unbounded context growth and accumulated errors. Common remedies such as context compression or retrieval-augmented prompting introduce trade-offs between information fidelity and reasoning stability. We present InfiAgent, a general-purpose framework that keeps the agent's reasoning context strictly bounded regardless of task duration by externalizing persistent state into a file-centric state abstraction. At each step, the agent reconstructs context from a workspace state snapshot plus a fixed window of recent actions. Experiments on DeepResearch and an 80-paper literature review task show that, without task-specific fine-tuning, InfiAgent with a 20B open-source model is competitive with larger proprietary systems and maintains substantially higher long-horizon coverage than context-centric baselines. These results support explicit state externalization as a practical foundation for stable long-horizon agents. Github Repo:https://github.com/ChenglinPoly/infiAgent

研究の動機と目的

  • Motivate the need for stable long-horizon autonomy in LLM agents and identify how unbounded context harms reasoning over time.
  • Propose a file-centric persistent state abstraction that keeps reasoning context strictly bounded.
  • Describe a hierarchical multi-level agent architecture and external attention to manage large documents.
  • Evaluate long-horizon stability and performance on research-oriented benchmarks without task-specific fine-tuning.

提案手法

  • Formalize the separation between persistent task state and bounded reasoning context.
  • Define persistent state F_t as a file-system workspace evolving via state-transition operators T( F_t, a_t ).
  • Construct bounded reasoning context c_t^{bounded} from a workspace snapshot and a fixed window of recent actions.
  • Implement a hierarchical agent stack (Alpha, Domain, Atomic) with Agent-as-a-Tool calls to manage complexity and reduce tool-call chaos.
  • Introduce an External Attention Pipeline to extract task-relevant information from large documents without inflating the main reasoning context.
  • Evaluate on DeepResearch benchmark and a long-horizon 80-paper literature review, comparing against larger models and ablations.
Figure 1: The InfiAgent Framework. InfiAgent implements a hierarchical execution stack over a file-centric persistent state. Files serve as the authoritative task memory, while an external attention mechanism processes heavy documents outside the bounded reasoning context. Periodic state consolidati
Figure 1: The InfiAgent Framework. InfiAgent implements a hierarchical execution stack over a file-centric persistent state. Files serve as the authoritative task memory, while an external attention mechanism processes heavy documents outside the bounded reasoning context. Periodic state consolidati

実験結果

リサーチクエスチョン

  • RQ1Can explicit externalization of persistent state into a file-centric workspace stabilize long-horizon LLM agents without task-specific fine-tuning?
  • RQ2Does a hierarchical agent architecture with bounded context reconstruction improve reliability and coverage on long-running research tasks?
  • RQ3Is external attention effective for processing large documents while preserving bounded reasoning context?

主な発見

SettingModelMaxMinAvg
Main results (with file-centric state; InfiAgent vs. baselines)InfiAgent801567.1
Main results (with file-centric state; InfiAgent vs. baselines)Gemini-3-Flash808080.0
Main results (with file-centric state; InfiAgent vs. baselines)Claude-4.5-Sonnet808080.0
Main results (with file-centric state; InfiAgent vs. baselines)Claude Code801129.1
Main results (with file-centric state; InfiAgent vs. baselines)Cursor | Claude-4.5-Sonnet501.0
Main results (with file-centric state; InfiAgent vs. baselines)Cursor | Gemini-3-Flash100.1
Ablation (remove file-centric state; compressed long-context prompts)No File State (Compressed Context)GPT-OSS-20B713.2
Ablation (remove file-centric state; compressed long-context prompts)No File State (Compressed Context)Gemini-3-Flash252021.1
Ablation (remove file-centric state; compressed long-context prompts)No File State (Compressed Context)Claude-4.5-Sonnet771127.7
  • With a 20B open-source model, InfiAgent achieves competitive DeepResearch performance compared to larger proprietary systems.
  • InfiAgent attains strong instruction-following and readability, contributing to stable long-horizon behavior.
  • On the 80-paper literature review, InfiAgent achieves high coverage and maintains stability across hundreds of steps, outperforming baselines using compressed long-context prompts.
  • Ablation removing the file-centric state substantially degrades coverage, supporting the importance of explicit persistent state externalization.
  • The long-horizon literature review shows InfiAgent with 20B model achieves up to 80.0 coverage with certain backbones, while ablations drop in average coverage.
Figure 2: Component-wise comparison on DeepResearch. Scores are broken down by evaluation dimension. InfiAgent shows strong performance on instruction following and readability, which are closely related to structured state management and output control.
Figure 2: Component-wise comparison on DeepResearch. Scores are broken down by evaluation dimension. InfiAgent shows strong performance on instruction following and readability, which are closely related to structured state management and output control.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。