QUICK REVIEW

[論文レビュー] HyMem: Hybrid Memory Architecture with Dynamic Retrieval Scheduling

Xiaochen Zhao, Kaikai Wang|arXiv (Cornell University)|Feb 15, 2026

Topic Modeling被引用数 0

ひとこと要約

tldr: HyMem は、動的オンデマンド取得と反映メカニズムを備えたデュアルグラニュラリティメモリアーキテクチャを導入し、LOCOMO および LongMemEval で先端的な効率と性能を実現。計算コストを最大 92.6% 低減。

ABSTRACT

Large language model (LLM) agents demonstrate strong performance in short-text contexts but often underperform in extended dialogues due to inefficient memory management. Existing approaches face a fundamental trade-off between efficiency and effectiveness: memory compression risks losing critical details required for complex reasoning, while retaining raw text introduces unnecessary computational overhead for simple queries. The crux lies in the limitations of monolithic memory representations and static retrieval mechanisms, which fail to emulate the flexible and proactive memory scheduling capabilities observed in humans, thus struggling to adapt to diverse problem scenarios. Inspired by the principle of cognitive economy, we propose HyMem, a hybrid memory architecture that enables dynamic on-demand scheduling through multi-granular memory representations. HyMem adopts a dual-granular storage scheme paired with a dynamic two-tier retrieval system: a lightweight module constructs summary-level context for efficient response generation, while an LLM-based deep module is selectively activated only for complex queries, augmented by a reflection mechanism for iterative reasoning refinement. Experiments show that HyMem achieves strong performance on both the LOCOMO and LongMemEval benchmarks, outperforming full-context while reducing computational cost by 92.6\%, establishing a state-of-the-art balance between efficiency and performance in long-term memory management.

研究の動機と目的

直接的な翻訳を必要とする自然言語テキストの目的：長文コンテキスト推論の効率化を動機付け、圧縮と生テキスト保存のメモリトレードオフに対処する。
動的・オンデマンド取得を備えたデュアルグラニュラリティのメモリシステム（レベル1要約とレベル2生テキスト）を提案する。
LLM ガイドの深い検索モジュールと反映メカニズムを導入し、回答を反復的に洗練させる。
長文コンテキストベンチマーク（LoCoMo/LongMemEval）で、フルコンテキストやベースライン手法と比較して性能と効率の改善を示す。

提案手法

生の対話をイベント単位に分割し、デュアルグラニュラリティとしてレベル1要約とレベル2生テキストとして保存する。
レベル1要約を埋め込みモデルで埋め込み、対応するレベル2コンテンツへのバックトラッキングリンクを作成する。
軽量メモリモジュールを用いてレベル1ユニットを高速・低コストで取得し、クイックコンテキストを形成する。
軽量パスが不完全な場合に深いメモリモジュールを起動し、 coarseな想起の後にLLMベースのセルフリトリーブで関連するレベル1/レベル2の内容を特定する。
completeness のチェックと多回の推論のためのクエリを反復的に洗練する反映モジュールを組み込む。

Figure 1 : Conventional lightweight methods struggle with complex tasks, while sophisticated approaches incur high overhead for simple queries. In contrast, our HyMem dynamically allocates memory resources based on task demands, achieving dual optimization of performance and efficiency.

実験結果

リサーチクエスチョン

RQ1デュアルグラニュラリティ設計がLLMエージェントの長期対話推論の効率と有効性にどう影響するか？
RQ2軽量メモリ取得と深いメモリ取得の動的スケジューリングは、静的取得戦略より良いトレードオフを生み出すか？
RQ3反映メカニズムは複雑なクエリにおける推論の不完全さと幻覚を緩和する役割を果たすか？
RQ4HyMem は LoCoMo/LongMemEval のような長期記憶・多段推論を要求するベンチマークで、フルコンテキストおよびベースラインと比較してどう機能するか？

主な発見

HyMem は LoCoMo および LongMemEval ベンチマークで効率と性能の最先端のバランスを達成。
フルコンテキストベースラインと比較して精度を維持または向上させつつ、計算コストを最大 92.6% 減少。
レベル1（要約）とレベル2（生テキスト）メモリは、複雑なクエリに対してのみ深モジュールを動的に活性化することで効果的な取得を可能にする。
軽量モジュールが約70%のクエリを効率的に処理し、難易度が高いケースで深いモジュールが反映メカニズムに導かれて呼び出される。
アブレーション研究により、軽量モジュールと反映を併用したときに顕著なトークン節約が見られ、メモリの粒度を調整した場合の精度損失は制御される。
圧縮分析では、イベントレベルの圧縮が一般的な事前圧縮手法より重要な情報を保持し、忘却リスクを低減する。

Figure 2 : Our approach demonstrates superior efficiency, achieving the best balance between performance and computational cost (measured in tokens) on the LOCOMO benchmark.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。