QUICK REVIEW

[论文解读] Revisiting the poverty of the stimulus: hierarchical generalization without a hierarchical bias in recurrent neural networks

R. Thomas McCoy, Robert Frank|arXiv (Cornell University)|Feb 25, 2018

Natural Language Processing Techniques参考文献 19被引用 47

一句话总结

本文测试递归神经网络在没有内在层级偏见的情况下是否能学习分层问题构造，结果发现一个带注意力的GRU在需要层级泛化，尤其在输入中有一致性线索时表现良好。

ABSTRACT

Syntactic rules in natural language typically need to make reference to hierarchical sentence structure. However, the simple examples that language learners receive are often equally compatible with linear rules. Children consistently ignore these linear explanations and settle instead on the correct hierarchical one. This fact has motivated the proposal that the learner's hypothesis space is constrained to include only hierarchical rules. We examine this proposal using recurrent neural networks (RNNs), which are not constrained in such a way. We simulate the acquisition of question formation, a hierarchical transformation, in a fragment of English. We find that some RNN architectures tend to learn the hierarchical rule, suggesting that hierarchical cues within the language, combined with the implicit architectural biases inherent in certain RNNs, may be sufficient to induce hierarchical generalizations. The likelihood of acquiring the hierarchical generalization increased when the language included an additional cue to hierarchy in the form of subject-verb agreement, underscoring the role of cues to hierarchy in the learner's input.

研究动机与目标

评估非层级偏置的RNN是否能在有限数据下学习分层问题构造。
在含有与不含主谓一致的语言片段上评估多种RNN架构。
研究输入中的层级线索如何影响分层泛化的出现。
分析架构类型和初始化如何影响泛化行为。

提出的方法

使用序列到序列RNN（编码器-解码器）来对陈述句及其问句形式进行建模。
测试六种架构：SRN、GRU、LSTM，每种有无注意力，覆盖两种语言片段（no-agreement 与 agreement）。
在120,000句子上为每种架构训练100个网络（总计1200个），在10,000句测试集和10,000句泛化集上评估。
训练两个任务：identity (IDENT) 和 question formation (QUEST)；泛化集包含被排除的句型，用以区分线性与层级假设。
通过在泛化集中检查第一个输出辅助项，当线性和层级不同的时候，评估预测是否符合层级规则与线性规则的对齐。

实验结果

研究问题

RQ1GRU/LSTM/GRU-with-attention 网络在没有显式层级偏置的情况下，是否能学习层级的 subject-auxiliary inversion？
RQ2提供层级线索（主谓一致）是否增加分层泛化的可能性？
RQ3不同RNN架构和初始化如何影响分层泛化结果？
RQ4与人类句法泛化错误相比，网络会犯哪些错误，这些揭示了哪些学习偏置？

主要发现

所有六种架构除了普通的 SRN 外，在测试集上的准确率均超过 94%，最好的是 99.9%（无注意力的 LSTM）。
在泛化集上，最佳架构（GRU with attention）只有大约 13% 的问句完全正确。
在输入中加入一致性增加了各架构的分层泛化概率。
初始化对每种架构的准确性有影响，表明偏差在随机初始下并非一致强。
带注意力的 GRU 在分层泛化方面显示出定性转变，与其他架构以线性规则为主不同；注意力对 GRU.concat 有影响。
带注意力的 GRU 编码了超出线性顺序的信息，表明其泛化依赖于层级线索，而非仅仅线性表示。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。