QUICK REVIEW

[Paper Review] Neural Turing Machines

Alex Graves, Greg Wayne|arXiv (Cornell University)|Oct 20, 2014

Neural Networks and Applications40 references108 citations

TL;DR

This paper introduces the Neural Turing Machine (NTM), a differentiable neural network architecture that extends recurrent networks with a differentiable external memory and attention-based read/write mechanisms, enabling end-to-end training via gradient descent. The NTM learns to infer and execute simple algorithms such as copying, sorting, and associative recall from input-output examples, outperforming standard RNNs on algorithmic tasks.

ABSTRACT

We extend the capabilities of neural networks by coupling them to external memory resources, which they can interact with by attentional processes. The combined system is analogous to a Turing Machine or Von Neumann architecture but is differentiable end-to-end, allowing it to be efficiently trained with gradient descent. Preliminary results demonstrate that Neural Turing Machines can infer simple algorithms such as copying, sorting, and associative recall from input and output examples.

Motivation & Objective

To develop a neural network architecture capable of learning and executing simple algorithms from input-output demonstrations.
To address the limitation of standard RNNs in handling complex data transformations requiring external memory and logical flow control.
To create a differentiable, end-to-end trainable system inspired by Turing machines and working memory, enabling gradient-based learning of algorithmic procedures.
To investigate whether neural networks can learn to use memory in a structured, addressable way to solve algorithmic tasks.
To demonstrate generalization beyond training data by learning to perform tasks such as sorting and associative recall using learned memory operations.

Proposed method

The NTM integrates a differentiable memory matrix that can be read from and written to using attentional mechanisms.
The controller network (feedforward or LSTM-based) generates read and write attention vectors to select specific memory locations.
Read operations compute a weighted sum over memory vectors based on content-based and location-based addressing.
Write operations update memory locations using a differentiable read-modify-write process with a learnable write gate.
The architecture supports multiple read and write heads, enabling parallel access to memory for complex tasks.
The entire system is trained end-to-end using backpropagation with gradient clipping and RMSProp optimization.

Experimental results

Research questions

RQ1Can a neural network learn to perform algorithmic tasks such as copying and sorting using an external, differentiable memory?
RQ2Can the NTM generalize to sequences longer than those seen during training, indicating true algorithmic learning?
RQ3How does the use of attention-based addressing improve performance on memory-intensive tasks compared to standard RNNs?
RQ4Can the NTM learn to sort sequences based on priority values without explicit supervision on the sorting mechanism?
RQ5To what extent can the NTM’s memory usage be interpreted as implementing known data structures, such as binary heaps?

Key findings

The NTM successfully learned to copy sequences of varying lengths, generalizing beyond the training sequence length.
For the associative recall task, the NTM achieved high accuracy on test sequences not seen during training, demonstrating robust generalization.
The priority sort task was solved by the NTM using a memory write pattern that closely matched a linear function of input priorities, suggesting it learned to implement a heap-like structure.
The NTM with an LSTM controller outperformed both the feedforward controller and standard LSTM networks on the priority sort task, especially with multiple read/write heads.
The NTM with 8 read/write heads and an LSTM controller achieved near-perfect performance on the priority sort task, indicating effective use of memory addressing.
The number of parameters in the NTM does not increase with memory size, unlike standard RNNs, making it scalable to large memory matrices.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.