QUICK REVIEW

[論文レビュー] M$^2$: Dual-Memory Augmentation for Long-Horizon Web Agents via Trajectory Summarization and Insight Retrieval

Dawei Yan, Haokui Zhang|arXiv (Cornell University)|Feb 28, 2026

Multimodal Machine Learning Applications被引用数 0

ひとこと要約

トレーニング不要のデュアルメモリフレームワーク（M2）を長期的ウェブナビゲーションに適用。内部軌跡要約と外部インサイト取得により、ファインチューニングなしで堅牢性と効率を向上させる。

ABSTRACT

Multimodal Large Language Models (MLLMs) based agents have demonstrated remarkable potential in autonomous web navigation. However, handling long-horizon tasks remains a critical bottleneck. Prevailing strategies often rely heavily on extensive data collection and model training, yet still struggle with high computational costs and insufficient reasoning capabilities when facing complex, long-horizon scenarios. To address this, we propose M$^2$, a training-free, memory-augmented framework designed to optimize context efficiency and decision-making robustness. Our approach incorporates a dual-tier memory mechanism that synergizes Dynamic Trajectory Summarization (Internal Memory) to compress verbose interaction history into concise state updates, and Insight Retrieval Augmentation (External Memory) to guide the agent with actionable guidelines retrieved from an offline insight bank. Extensive evaluations across WebVoyager and OnlineMind2Web demonstrate that M$^2$ consistently surpasses baselines, yielding up to a 19.6% success rate increase and 58.7% token reduction for Qwen3-VL-32B, while proprietary models like Claude achieve accuracy gains up to 12.5% alongside significantly lower computational overhead.

研究の動機と目的

長期的ウェブナビゲーションにおける文脈爆発を削減することで、効率と堅牢性のギャップを解消する。
内部軌跡要約と外部インサイト取得を組み合わせたメモリアーキテクチャを開発する。
メモリ拡張エージェントがSFT/RLベースのモデルと競争力を持ちつつ、トークン数と計算を節約できることを実証する。
多様なウェブドメインに対して、洞察のクロスモデル・クロスドメイン移転性を示す。

提案手法

内部メモリ（Dynamic Trajectory Summarization）を導入し、各ステップで要約された状態抽象s_tを自分で生成する。
外部メモリ（Insight Retrieval Augmentation）を導入し、成功した軌跡からオフラインのInsight Bankを構築し、意味的類似性でトップ-iの洞察を取得する。
完全コンテキストをデュアルメモリコンテキストC_t' = {P_sys, Q, M_int_t, M_ext, O_t}に置換し、引き締まったが有益なコンテキストを維持する。
プロンプト駆動エージェントを用いて、T_t, A_t, s_tの3要素を出力させる。ここでs_tはビジュアルフィードバックと実行アクションを要約する。
インサイトBankは、抽象化モデルを通じて成功した軌跡から高レバレンスの相互作用ルールを蒸留して構築し、クエリをSentence Transformerでエンコードして取得する。
推論時にはコサイン類似度でトップ-iのインサイトを取得し、システムプロンプトへ防御的ヒントとして注入して行動を誘導する。

Figure 1 : Average Token cost and accuracy of baseline and M 2 across models on WebVoyager.

実験結果

リサーチクエスチョン

RQ1トレーニング不要のデュアルメモリアーキテクチャは、SFT/RLなしで長期ウェブナビゲーションの性能を向上させ得るか？
RQ2内部軌跡要約と外部インサイトは、意思決定品質を保ちつつ文脈サイズを効果的に削減できるか？
RQ3クロストラジェクトリ洞察は異なるウェブドメインやモデルバックボーン間で移転可能か？
RQ4デュアルメモリフレームワークを用いた場合、全文脈ベースのベースラインと比較して、トークンコストや遅延などの効率トレードオフはどうなるか？

主な発見

M2はWebVoyagerとOnlineMind2Webのベンチマークで、モデルバックボーンを超えて一貫した改善を示す。
Qwen3-VL-32Bでは、M2は両ベンチマークで最大19.6%の成功率向上とトークン削減の50%以上を達成。
Claudeモデルでは、精度向上が最大12.5%に達し、トークン削減が30.3%〜55.0%と大きい。
内部メモリはWebVoyagerでのトークンコストを約57%削減（215.2kから92.3kへ）した。
外部メモリは堅牢性を提供；Google Mapsドメインではインサイト導入で精度が63.4%から80.5%へ改善。
インサイトBankはタスク横断移転を可能にし、クエリあたり約6 msの高速取得を実現し、長期的な安定計画を支援する。

Figure 2 : Overview of the proposed framework. (a) The baseline agent operates with raw context containing redundant visual history and verbose interaction text. This creates high computational overhead and introduces noise that may impair decision-making. Our method M 2 incorporates two key mechani

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。