Skip to main content
QUICK REVIEW

[论文解读] Tracking the World State with Recurrent Entity Networks

Mikael Henaff, Jason Weston|arXiv (Cornell University)|Dec 12, 2016
Topic Modeling被引用 157
一句话总结

本文介绍了 Recurrent Entity Network (EntNet),一种带并行动态记忆槽的带记忆增强的模型,用于跟踪世界状态,在 bAbI 任务上达到最先进的结果,在 CBT 的单遍读写上表现良好。

ABSTRACT

We introduce a new model, the Recurrent Entity Network (EntNet). It is equipped with a dynamic long-term memory which allows it to maintain and update a representation of the state of the world as it receives new data. For language understanding tasks, it can reason on-the-fly as it reads text, not just when it is required to answer a question or respond as is the case for a Memory Network (Sukhbaatar et al., 2015). Like a Neural Turing Machine or Differentiable Neural Computer (Graves et al., 2014; 2016) it maintains a fixed size memory and can learn to perform location and content-based read and write operations. However, unlike those models it has a simple parallel architecture in which several memory locations can be updated simultaneously. The EntNet sets a new state-of-the-art on the bAbI tasks, and is the first method to solve all the tasks in the 10k training examples setting. We also demonstrate that it can solve a reasoning task which requires a large number of supporting facts, which other methods are not able to solve, and can generalize past its training horizon. It can also be practically used on large scale datasets such as Children's Book Test, where it obtains competitive performance, reading the story in a single pass.

研究动机与目标

  • 在处理文本时,动机是需要维护一个动态的世界状态表示。
  • 提出一种带并行门控记忆槽的记忆增强神经网络,用以更新实体特定的表示。
  • 证明 EntNet 能解决所有 bAbI 任务并能推广到超出训练范围的更长序列。
  • 在儿童读物测试(CBT)上,使用单遍阅读也能取得具有竞争力的结果。

提出的方法

  • 提出 EntNet with a fixed number of memory slots, each with a key w_j and content h_j, updated by a gating mechanism conditioned on input.
  • Use a parallel set of gated RNNs (memory blocks) with shared parameters to model concept-entity dynamics.
  • Define a content- and location-based gating function g_j = sigmoid(s_t^T h_j + s_t^T w_j) to determine per-slot updates.
  • Provide an input encoder that aggregates input tokens into a fixed-length vector s_t via a learned mask and summation.
  • Implement an output module akin to a one-hop Memory Network that weighs memories by q^T h_j and combines them to predict answers.
  • Train the entire system by backpropagation through time, propagating gradients from time steps requiring an output.

实验结果

研究问题

  • RQ1一个固定大小的并行记忆增强网络在处理顺序文本时是否能够维持并更新内部世界模型?
  • RQ2EntNet 是否能够扩展到更长的推理序列,并且能否推广到超出训练时间窗的情况?
  • RQ3与以往的记忆架构相比,EntNet 在标准推理基准(bAbI)和现实世界数据(CBT)上的表现如何?

主要发现

  • EntNet 以 10k 个训练样本解决了全部 20 个 bAbI 任务,创下新的最优成绩。
  • 该模型能推广到比训练时见到的更长的序列,表明学习到了世界动态。
  • 在合成的 World Model 任务中,随着序列长度的增长,EntNet 的表现超过 MemN2N 和 LSTM,并且能超出训练时的时界进行泛化。
  • EntNet 在 CBT 上取得了具有竞争力的结果,其中一个简化变体在命名实体和普通名词任务中作为单遍模型表现最佳。

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。