QUICK REVIEW

[论文解读] The neural network pushdown automation: model, stack and learning simulations

Guo-Zheng Sun, C. Lee Giles|arXiv (Cornell University)|Aug 1, 1993

Machine Learning and Algorithms参考文献 41被引用 32

一句话总结

本文提出了一种神经网络下推自动机（NNPDA），通过将循环神经网络与连续可微的栈结合，以学习确定性上下文无关文法。通过在模拟栈上使用梯度下降优化联合误差函数，该模型在训练后可提取出一个离散的下推自动机（PDA），该PDA能正确识别任意长度的未见过的字符串，其结构与源文法的PDA一致。

ABSTRACT

In order for neural networks to learn complex languages or grammars, they must have sufficient computational power or resources to recognize or generate such languages. Though many approaches have been discussed, one ob- vious approach to enhancing the processing power of a recurrent neural network is to couple it with an external stack memory - in effect creating a neural network pushdown automata (NNPDA). This paper discusses in detail this NNPDA - its construction, how it can be trained and how useful symbolic information can be extracted from the trained network. In order to couple the external stack to the neural network, an optimization method is developed which uses an error function that connects the learning of the state automaton of the neural network to the learning of the operation of the external stack. To minimize the error function using gradient descent learning, an analog stack is designed such that the action and storage of information in the stack are continuous. One interpretation of a continuous stack is the probabilistic storage of and action on data. After training on sample strings of an unknown source grammar, a quantization procedure extracts from the analog stack and neural network a discrete pushdown automata (PDA). Simulations show that in learning deterministic context-free grammars - the balanced parenthesis language, 1*n0*n, and the deterministic Palindrome - the extracted PDA is correct in the sense that it can correctly recognize unseen strings of arbitrary length. In addition, the extracted PDAs can be shown to be identical or equivalent to the PDAs of the source grammars which were used to generate the training strings.

研究动机与目标

增强循环神经网络的外部栈内存，以提升其学习复杂文法的计算能力。
开发一种可微分的栈机制，支持端到端的梯度下降训练。
从训练后的神经网络与栈系统中提取出离散的符号化下推自动机（PDA）。
证明提取出的PDA能正确识别来自确定性上下文无关文法的未见过的字符串。
验证提取出的PDA与生成训练数据的源文法的PDA等价或相同。

提出的方法

设计一种连续可微的栈，其中栈操作（压栈/弹栈）与数据存储均建模为连续函数。
定义一个联合误差函数，将神经网络状态自动机的学习与栈操作的学习耦合。
使用梯度下降最小化误差函数，实现网络与栈的反向传播。
应用量化过程，将模拟栈与训练后的网络转换为离散的下推自动机（PDA）。
在未知的确定性上下文无关文法的样本字符串上训练NNPDA，例如 1*n0*n 和回文串。
通过测试其识别任意长度未见字符串的能力，验证提取出的PDA。

实验结果

研究问题

RQ1带有可微分栈的神经网络能否学习识别确定性上下文无关语言，如平衡括号和回文串？
RQ2是否能够从训练后的神经网络与模拟栈系统中提取出符号化的离散PDA？
RQ3提取出的PDA能否正确识别目标文法中任意长度的未见字符串？
RQ4提取出的PDA是否与生成训练数据的源文法的PDA等价或相同？
RQ5基于梯度的学习能否有效联合训练神经网络与栈操作，实现统一框架？

主要发现

NNPDA成功学习到识别来自确定性上下文无关文法（包括平衡括号语言 1*n0*n 和确定性回文串）的任意长度未见字符串。
训练后，量化过程提取出的离散PDA能正确分类目标文法的所有未见测试字符串。
提取出的PDA与生成训练字符串的源文法的PDA完全相同或等价。
连续栈支持对栈操作的有效反向传播，实现网络与栈行为的联合优化。
该模型表明，符号知识（即离散PDA）可从具有外部记忆的可微分神经网络中提取。
该方法能对训练期间未见过的更长字符串实现正确泛化，表明对底层文法的稳健学习。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。