QUICK REVIEW

[论文解读] Still not systematic after all these years: On the compositional skills of sequence-to-sequence recurrent networks

Brenden M. Lake, Marco Baroni|arXiv (Cornell University)|Oct 31, 2017

Natural Language Processing Techniques被引用 51

一句话总结

本文引入SCAN领域以评估序列到序列RNN的组合泛化能力，表明尽管在相似指令上表现强劲，当需要系统性地泛化时——例如将新动词与修饰语结合——RNN的表现会严重失败，凸显了神经网络在系统性方面的核心局限。研究结果表明，缺乏系统性泛化可能是神经模型需要海量训练数据的原因。

ABSTRACT

Humans can understand and produce new utterances effortlessly, thanks to their systematic compositional skills. Once a person learns the meaning of a new verb dax, he or she can immediately understand the meaning of twice or sing and dax. In this paper, we introduce the SCAN domain, consisting of a set of simple compositional navigation commands paired with the corresponding action sequences. We then test the zero-shot generalization capabilities of a variety of recurrent neural networks (RNNs) trained on SCAN with sequence-to-sequence methods. We find that RNNs can generalize well when the differences between training and test commands are small, so that they can apply mix-and-match strategies to solve the task. However, when generalization requires systematic compositional skills (as in the dax example above), RNNs fail spectacularly. We conclude with a proof-of-concept experiment in neural machine translation, supporting the conjecture that lack of systematicity is an important factor explaining why neural networks need very large training sets.

研究动机与目标

探究循环神经网络（RNNs）是否能够学习类似人类语言理解的系统性组合泛化能力。
识别序列到序列RNN在需要全新组合已知元素时的零样本泛化能力的局限性。
评估RNN在系统性泛化方面的失败是否是神经模型需要大规模训练数据的关键原因。
提供一个基准领域——SCAN——以将组合泛化与其它语言复杂性分离开来。

提出的方法

作者引入SCAN领域，这是一个具有组合性指令和对应动作序列的合成导航环境。
他们在SCAN领域的多样化训练指令上训练多种序列到序列RNN架构。
通过在已知组件的新组合上进行零样本测试来评估泛化能力，例如将新动词与修饰语配对。
在简单指令变化和复杂组合泛化任务上测量模型的性能。
开展一个神经机器翻译的验证性实验，以检验系统性约束是否影响数据效率。

实验结果

研究问题

RQ1RNN能否对已知语言成分的未见过的组合实现系统性泛化，例如将新动词与频率修饰语结合？
RQ2零样本泛化性能如何随测试集中组合结构复杂度的变化而变化？
RQ3RNN在系统性泛化方面的失败在多大程度上解释了其对大规模训练数据集的依赖？
RQ4系统性泛化能力的缺失是否在真实世界的神经机器翻译任务中同样表现出来？

主要发现

RNN在训练指令的简单变体上表现优异，表明其具备有效的模式匹配和组合策略。
当需要系统性组合时，RNN在零样本泛化任务上完全失败，例如在' dax'为新动词时理解'twice dax'。
失败并非源于模型容量或架构问题，而是由于无法以新颖方式组合各成分的意义。
在神经机器翻译中，使用有限数据训练的模型无法实现系统性泛化，支持了系统性是数据效率关键瓶颈的推测。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。