QUICK REVIEW

[論文レビュー] Language Model Memory and Memory Models for Language

Benjamin L. Badger|arXiv (Cornell University)|Feb 13, 2026

Topic Modeling被引用数 0

ひとこと要約

この論文は、標準的な言語モデルの埋め込みが入力情報をほとんど保持しない一方で、オートエンコーダがほぼ完璧な記憶を保持することを示し、エンコーダ-デコーダ memory モデルを導入して組み合わせ目的とカリキュラム訓練により情報豊かな記憶の形成とデコードを行う手法を提示する。

ABSTRACT

The ability of machine learning models to store input information in hidden layer vector embeddings, analogous to the concept of `memory', is widely employed but not well characterized. We find that language model embeddings typically contain relatively little input information regardless of data and compute scale during training. In contrast, embeddings from autoencoders trained for input regeneration are capable of nearly perfect memory formation. The substitution of memory embeddings for token sequences leads to substantial computational efficiencies, motivating the introduction of a parallelizable encoder-decoder memory model architecture. Upon causal training these models contain information-poor embeddings incapable of arbitrary information access, but by combining causal and information retention objective functions they learn to form and decode information-rich memories. Training can be further streamlined by freezing a high fidelity encoder followed by a curriculum training approach where decoders first learn to process memories and then learn to additionally predict next tokens. We introduce the perspective that next token prediction training alone is poorly suited for accurate memory formation as the objective itself is non-invertible, motivating the use of combined objective functions for models where the entire input is not exposed.

研究の動機と目的

言語モデルの埋め込みにおいて、異なる訓練 regime の下でどれだけの入力情報が保持されるかを評価する。
因果言語モデル、リトリーバルモデル、およびオートエンコーダを記憶形成と可逆性の観点から比較する。
任意の入力情報の検索を可能にする平行化可能なエンコーダ-デコーダ memory アーキテクチャを提案する。
frozen エンコーダ、カリキュラム学習などの訓練戦略を示し、効率性を犠牲にすることなく記憶形成を改善する。

提案手法

情報保持を測定し入力系列を再構成するために訓練可能なデコーダで埋め込みを反転させる。
エントロピー比とハミングベースのトークン精度指標を用いた情報定量化フレームワークを導入する。
平行化可能なエンコーダ-デコーダ memory モデルを開発し、因果訓練と組み合わせ目的関数で評価する。
情報保持と次語予測を分離するための frozen-encoder memory モデルとカリキュラム訓練を探究する。
大規模事前学習済み LLM を memory model デコーダとして使用し、モデルサイズ間のスケーラビリティを評価する。
エンコーダ-デコーダ情報保持、コピータスク、ブランクコピータスクの3つの評価モダリティを適用して memory 能力を検証する。

Figure 1: Information retention experimental approach (left) and example training runs (right).

実験結果

リサーチクエスチョン

RQ1因果言語モデルはメモリ埋め込みにどれだけの入力情報を保持しているのか？
RQ2memory モデルを訓練して、別個のデコーダによってデコード可能な情報量豊かな正確な記憶を形成できるか？
RQ3エンコーダ-デコーダ memory アーキテクチャは全文脈モデルと比較して計算上の利点と記憶能力を提供するか？
RQ4訓練戦略（例： frozen エンコーダ、カリキュラム学習、組み合わせ目的）は言語モデリング性能を犠牲にすることなく記憶形成を最適化できるか？
RQ5事前学習済みの大規模言語モデルは memory-augmented encoding のデコーダとして効果的に機能するか？

主な発見

因果言語モデルの記憶はデータ量と計算量の規模に対して比較的小さな入力情報を含む。
入力再生成を目的に訓練されたオートエンコーダは高い情報豊かな記憶を形成し、ほぼ完璧な記憶に近づく。
組み合わせ目的を持つ平行化可能なエンコーダ-デコーダ memory アーキテクチャは記憶形成を改善し、任意の情報アクセスを可能にする。
frozen エンコーダ memory モデルとカリキュラム訓練は効率的な訓練と堅牢な記憶能力を達成する。
因果とコピー目的の組み合わせで訓練された memory モデルは次のトークンを予測しつつ情報豊かな記憶を保存・使用できるが、正確な性能はアーキテクチャの選択と訓練 regime に依存する。
モデルサイズの増加だけでは、大規模な事前学習済み LLM からデコーダを使用する場合、 memory モデルの情報保持に控えめな利得しか得られない。

Figure 2: Memory Model Architecture and $n_{ctx}=256$ per chunk, $s=4$ chunk causal training characteristics on FineWeb. Mixers are $d_{m}=512$ for encoders, $d_{m}=1024$ for decoders and Transformers $d_{m}=256$ and $d_{m}=512$ for compute equivalence.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。