QUICK REVIEW

[論文レビュー] Experience-Driven Multi-Agent Systems Are Training-free Context-aware Earth Observers

Pengyu Dai, Weihao Xuan|arXiv (Cornell University)|Jan 30, 2026

Multimodal Machine Learning Applications被引用数 0

ひとこと要約

GeoEvolverは訓練不要の経験駆動型マルチエージェントシステムで、メモリバンクに微細なEOツール実行 priors を蓄積し、パラメータ更新なしでエンドツーエンドの地球観測タスクを改善する。クエリを分解し、ツール構成を探索し、失敗を再利用可能なメモリに蒸留する。

ABSTRACT

Recent advances have enabled large language model (LLM) agents to solve complex tasks by orchestrating external tools. However, these agents often struggle in specialized, tool-intensive domains that demand long-horizon execution, tight coordination across modalities, and strict adherence to implicit tool constraints. Earth Observation (EO) tasks exemplify this challenge due to the multi-modal and multi-temporal data inputs, as well as the requirements of geo-knowledge constraints (spectrum library, spatial reasoning, etc): many high-level plans can be derailed by subtle execution errors that propagate through a pipeline and invalidate final results. A core difficulty is that existing agents lack a mechanism to learn fine-grained, tool-level expertise from interaction. Without such expertise, they cannot reliably configure tool parameters or recover from mid-execution failures, limiting their effectiveness in complex EO workflows. To address this, we introduce \textbf{GeoEvolver}, a self-evolving multi-agent system~(MAS) that enables LLM agents to acquire EO expertise through structured interaction without any parameter updates. GeoEvolver decomposes each query into independent sub-goals via a retrieval-augmented multi-agent orchestrator, then explores diverse tool-parameter configurations at the sub-goal level. Successful patterns and root-cause attribution from failures are then distilled in an evolving memory bank that provides in-context demonstrations for future queries. Experiments on three tool-integrated EO benchmarks show that GeoEvolver consistently improves end-to-end task success, with an average gain of 12\% across multiple LLM backbones, demonstrating that EO expertise can emerge progressively from efficient, fine-grained interactions with the environment.

研究の動機と目的

EOの失敗が計画だけでなく実行の groundedness に起因する理由を特定する。
モデルパラメータを更新せず、構造化された相互作用を通じて EO 専門知識を獲得する GeoEvolver を提案する。
実行経験の記憶が複数のLLMバックボーンにわたってエンドツーエンドのEOタスクの成功を改善することを示す。

提案手法

各EOクエリを専門の実行者に割り当てられたモジュラーなサブゴールへ分解する。
パターンと失敗のメモリーバンクからサブゴールを組み立てる retrieval-augmented オーケストレーターを使用する。
複数のバリアントとリトライを許可してロバストなツール構成を見つける並列探索を行う。
サブゴールの軌跡を判断・検証し、成功/失敗信号をメモリへ伝搬する。
グローバルMemory BankとローカルWorking Memoryの二層メモリシステムを維持する。
単一バリアント抽出とコントラスト distillation によって、成功パターンと失敗の帰属を反復的にメモリバンクへ蒸留する。

実験結果

リサーチクエスチョン

RQ1GeoEvolverは多様なLLMバックボーンに対してエンドツーエンドEOタスクの性能を向上させるか。
RQ2モデルの能力がEOベンチマークを横断するGeoEvolverの利得にどのように影響するか。
RQ3GeoEvolverは異なるツール–モダリティ結合を持つEOベンチマークに対して堅牢か。
RQ4GeoEvolverは既存のメモリベースおよびマルチエージェントEO手法とどう比較されるか。
RQ5実行者の人数、推論バリアント、メモリ項目数が性能に与える影響はどのようか。

主な発見

Method	Tool-A-O ↑	Tool-I-O ↑	Tool-E-M ↑	Efficiency ↓	Accuracy ↑
Expel	32.72	25.94	22.48	1.79	22.58
Zhao et al. (Training-free GRPO)	57.24	44.36	36.44	1.36	31.25
Chase (DeepAgents)	41.67	33.98	25.45	1.06	29.69
Earth-Agent-MAS	32.28	26.96	20.91	1.47	15.87
Ours (GeoEvolver)	57.66	44.66	39.06	1.47	76.56

GeoEvolverは複数のバックボーンでEOベンチマークのエンドツーエンド精度を平均12.56ポイント向上させた。
小型モデルはメモリ強化による経験から不均衡に利益を得る傾向があり、例: Qwen3-32B は 24.80% から 46.96% に改善（+22.16pp）。
エンドツーエンドの精度向上は、ステップレベルのスコアが低下することを伴い得る。すなわち機能的には正しいが人間の軌道ではない場合がある。
GeoEvolverはEarth-Agent-MAS などの地球エージェントベンチマークで、メモリベース手法や固定ワークフロー MAS を上回る（例: 76.56% 対 15.87%）。
アブレーションにより自己対比と並列探索が最大の利点をもたらし、除去すると顕著な低下が生じることが示された。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。