QUICK REVIEW

[論文レビュー] Associative Long Short-Term Memory

Ivo Danihelka, Greg Wayne|arXiv (Cornell University)|Feb 9, 2016

Advanced Memory and Neural Computing参考文献 18被引用数 108

ひとこと要約

論文は、冗長なホログラフィック記憶を用いてキー–値ペアを格納しネットワークパラメータを増やすことなく、 memorization speed と capacity を向上させる memory-augmented RNN（Associative LSTM）を紹介します。HRR ベースの結合を LSTM ゲートおよび複数の読み書きコピーと統合して、取得ノイズを低減します。

ABSTRACT

We investigate a new method to augment recurrent neural networks with extra memory without increasing the number of network parameters. The system has an associative memory based on complex-valued vectors and is closely related to Holographic Reduced Representations and Long Short-Term Memory networks. Holographic Reduced Representations have limited capacity: as they store more information, each retrieval becomes noisier due to interference. Our system in contrast creates redundant copies of stored information, which enables retrieval with reduced noise. Experiments demonstrate faster learning on multiple memorization tasks.

研究の動機と目的

Motivate enhancing LSTM with memory mechanisms without adding parameters.
Develop an associative, key–value memory using holographic reduced representations.
Introduce redundancy through multiple memory copies to reduce retrieval noise.
Integrate the redundant associative memory with LSTM gates to form a unified architecture.
Demonstrate faster learning and competitive performance on memorization and sequence tasks.

提案手法

Represent key–value pairs using holographic reduced representations with binding via complex-valued operations.
Create redundant memory traces by storing multiple transformed copies of each key–value pair with independent permutations.
Retrieve by averaging across copies and using conjugate-like operations to bind/unbind keys.
Integrate the associative memory into LSTM by producing complex-valued keys and update rules that mirror LSTM gating (forget, input, output) with complex-valued components.
Allow parallel updating of copies and allow head-like reading via multiple keys (memory heads).
Compare against baselines (LSTM, Permutation RNN, Unitary RNN, Multiplicative Unitary RNN) and assess learning speed and accuracy across tasks.

実験結果

リサーチクエスチョン

RQ1Can an associative, memory-augmented LSTM store and retrieve key–value pairs with higher capacity without increasing parameter count?
RQ2Does redundant storage via multiple copies reduce retrieval noise and improve learning speed on memorization and sequence tasks?
RQ3How does associative memory integrate with LSTM gates to preserve sequence modeling capabilities while enabling memory addressing?
RQ4How does the Associative LSTM compare to standard LSTM and other memory-augmented models on canonical tasks (episodic copy, XML modeling, variable assignment, arithmetic, Wikipedia)?

主な発見

The redundant associative memory enables larger effective memory capacity without increasing network parameters.
Retrieval noise decreases as the number of stored items grows when using multiple copies with random permutations, approximately keeping retrieval error in check when copies scale with items.
Associative LSTM achieves faster learning on episodic copy and XML modeling tasks compared with LSTM and competitive results on other tasks, especially as the number of copies increases.
On the episodic copy task, associative memory with multiple copies improves speed; single-copy associative LSTM is competitive but not always superior to larger LSTM.
On the XML modeling task, Associative LSTM shows significant advantages with more copies, outperforming or matching LSTM in several configurations.
On variable assignment and arithmetic tasks, multiple reading/writing heads (copies) help Associative LSTM solve tasks more efficiently, though task details vary with copy count and architecture.
On Wikipedia language modeling, Associative LSTM performs comparably to LSTM, indicating it is at least as general as LSTM for sequence modeling.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。