[Paper Review] On the Turing Completeness of Modern Neural Network Architectures
The paper proves that Transformer and Neural GPU architectures are Turing complete based on their capacity to compute and access internal dense representations, without external memory, under arbitrary precision assumptions. It also analyzes the role of positional encodings and compares to prior results.
Alternatives to recurrent neural networks, in particular, architectures based on attention or convolutions, have been gaining momentum for processing input sequences. In spite of their relevance, the computational properties of these alternatives have not yet been fully explored. We study the computational power of two of the most paradigmatic architectures exemplifying these mechanisms: the Transformer (Vaswani et al., 2017) and the Neural GPU (Kaiser & Sutskever, 2016). We show both models to be Turing complete exclusively based on their capacity to compute and access internal dense representations of the data. In particular, neither the Transformer nor the Neural GPU requires access to an external memory to become Turing complete. Our study also reveals some minimal sets of elements needed to obtain these completeness results.
Motivation & Objective
- Motivate the study of the computational power of non-recurrent neural architectures (attention and convolutions) for learning algorithms.
- Formally define Turing completeness for seq-to-seq neural networks within a rational-precision framework.
- Show that Transformer and Neural GPU achieve Turing completeness without external memory, under arbitrary internal precision.
- Identify minimal elements needed to obtain Turing completeness for these architectures.
Proposed method
- Provide formal definitions for seq-to-seq recognizers and Turing completeness (embedding, seed, and final vector sets).
- Show that encoder–decoder RNNs are Turing complete (Siegelmann & Sontag results) under bounded resources and certain activations.
- Formalize Transformer architecture with attention, encoders/decoders, and positional encodings, using hard attention in proofs.
- Demonstrate that Transformer with positional encodings is Turing complete by simulating a Turing machine.
- Analyze the Neural GPU as a seq-to-seq model and prove uniform Neural GPUs are Turing complete by simulating an RNN encoder–decoder.
- Discuss differences with standard Transformer implementations and the necessity of arbitrary precision.
Experimental results
Research questions
- RQ1Can modern attention- or convolution-based architectures achieve Turing completeness without external memory?
- RQ2What minimal architectural components (e.g., positional encodings, hard attention) are required to achieve Turing completeness for Transformers?
- RQ3How does the Neural GPU's structure enable simulation of RNN-based computations within a bounded architecture?
- RQ4What are the trade-offs between practical finite-precision hardware and theoretical unrestricted precision in establishing completeness?
Key findings
- Transformer networks with positional encodings are Turing complete under arbitrary precision assumptions.
- Transformer without positional encodings is order- and proportion-invariant and cannot recognize certain regular languages, showing limited power without position information.
- The paper provides a constructive proof that a Transformer can simulate a Turing machine using one encoder layer and three decoder layers with a specific dense representation size.
- Uniform Neural GPUs are Turing complete by simulating seq-to-seq RNNs, connecting Neural GPU computation to classic RNN-based recognizers.
- The results are presented with formal proofs (appendix contains full details) and rely on rational activations and rational-valued internal representations.
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.