QUICK REVIEW

[論文レビュー] TodoEvolve: Learning to Architect Agent Planning Systems

Jiaxi Liu, Yanzuo Jiang|arXiv (Cornell University)|Feb 8, 2026

AI-based Problem Solving and Planning被引用数 0

ひとこと要約

TodoEvolve は PlanFactory を導入してタスク固有の計画アーキテクチャを合成し、Impedance-Guided Preference Optimization を通じて Todo-14B を訓練し、計画トポロジー、初期化、適応、ナビゲーションを最適化します。

ABSTRACT

Planning has become a central capability for contemporary agent systems in navigating complex, long-horizon tasks, yet existing approaches predominantly rely on fixed, hand-crafted planning structures that lack the flexibility to adapt to the structural diversity of open-ended problems. To address this limitation, we introduce TodoEvolve, a meta-planning paradigm that autonomously synthesizes and dynamically revises task-specific planning architectures. Specifically, we first construct PlanFactory, a modular design space that standardizes diverse planning paradigms within a unified codebase encompassing topology, initialization, adaptation, and navigation, thereby providing a common interface for heterogeneous planning patterns. Leveraging PlanFactory, we collect high-quality planning trajectories and train Todo-14B via \textit{Impedance-Guided Preference Optimization} (IGPO), a multi-objective reinforcement learning objective that encourages the generation of planning systems that are performant, stable, and token-efficient across arbitrary tasks and agent backbones. Empirical evaluations on five agentic benchmarks demonstrate that TodoEvolve consistently surpasses carefully engineered planning modules while maintaining economical API costs and runtime overhead.

研究の動機と目的

オープンエンドなタスクにおける固定の手作りプランナーを超える適応可能な計画アーキテクチャの必要性を動機づける。
PlanFactory を多様な計画トポロジーと機構の統一設計空間として提案する。
Todo-14B を IGPO で開発し、計画性能・安定性・トークン効率を共同最適化する。
複数のエージェント計測基準で横断的な一般化とパレート効率を示す。

提案手法

Topology、Initialization、Adaptation、Navigation の四つのモジュールを備えた PlanFactory を定義し、多様なプランナーの統一コードベースを可能にする。
代表的な十個の計画アーキテクチャを PlanFactory の素子に分解してモジュール化設計空間を作成する。
TodoEvolve を導入し、タスク固有の計画設定を合成し、実行中に動的に改訂するメタ-planner を提示する。
Impedance-Guided Preference Optimization (IGPO) を用いて Todo-14B を訓練し、性能・安定性・トークン効率の多目的最適化を行う。
PlanFactory 内で Bootstrap-and-Filter を用いて高品質な計画データセットを構築し、Execution-as-Judge による検証とインピーダンスベースのランキングで IGPO を推進する。
二段階の訓練 regime：Stage 1 は構造的適性を植え付ける SFT、Stage 2 はアーキテクチャ効率を最適化する IGPO。

Figure 1 : The overall inference workflow of TodoEvolve first constructs a customized planning system along four dimensions—topology, initialization, adaptation, and navigation, and then deploys it in real time to orchestrate agent execution.

実験結果

リサーチクエスチョン

RQ1メタ-planning モデルは、固定プランナーを超えるタスク固有の計画アーキテクチャを合成して、異なる領域で優位性を発揮できるか。
RQ2統一された PlanFactory のコードベースは、異種の計画パラダイムの効率的なベンチマークと比較を支援できるか。
RQ3IGPO は、バックボーンが異なる場合でも計画の効率性と安定性を信頼性高く向上させ、性能を維持できるか。
RQ4TodoEvolve はオープンエンドで長期的なタスクや変動するエージェントバックボーンにどれだけ一般化できるか。

主な発見

TodoEvolve は、GPT-5-Mini で GAIA の最大 16.37% の改善を含む、5 つのベンチマークを跨いで慎重に設計された計画モジュールに対して顕著な性能向上を示す。
TodoEvolve は多様な LLM バックボーンに対して一般化し、xBench-DS における GPT-5-Mini を報告設定で 75% にまで向上させる。
TodoEvolve は高い複雑度 GAIA Level 3 の状況で頑健性を示し、DeepSeek V3.2 で 53.85% を達成し、より強力なエージェントの性能に近づく。
フレームワークは高度なベースラインと同程度のコストと待機時間を維持しつつ、成功率を高め、優れたパレート効率を示す。
アブレーション研究は SFT が構造的基盤に不可欠である一方、IGPO が効率性と長期的計画能力を向上させることを示す。
ケーススタディは、タスクの進化とアクセス障壁を予測して動的かつ状態認識的な計画トポロジーを示す。

Figure 2 : Task-Dependent Performance Variability.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。