QUICK REVIEW

[论文解读] Learning to Remember Rare Events

Łukasz Kaiser, Ofir Nachum|arXiv (Cornell University)|Mar 9, 2017

Domain Adaptation and Few-Shot Learning参考文献 20被引用 237

一句话总结

引入一个可扩展的终身记忆模块，供神经网络使用，通过对学习得到的键值记忆进行快速最近邻检索实现终身一次性学习，在 Omniglot 上达到最新成果，并通过基于记忆的一次性能力提升翻译。

ABSTRACT

Despite recent advances, memory-augmented deep neural networks are still limited when it comes to life-long and one-shot learning, especially in remembering rare events. We present a large-scale life-long memory module for use in deep learning. The module exploits fast nearest-neighbor algorithms for efficiency and thus scales to large memory sizes. Except for the nearest-neighbor query, the module is fully differentiable and trained end-to-end with no extra supervision. It operates in a life-long manner, i.e., without the need to reset it during training. Our memory module can be easily added to any part of a supervised neural network. To show its versatility we add it to a number of networks, from simple convolutional ones tested on image classification to deep sequence-to-sequence and recurrent-convolutional models. In all cases, the enhanced network gains the ability to remember and do life-long one-shot learning. Our module remembers training examples shown many thousands of steps in the past and it can successfully generalize from them. We set new state-of-the-art for one-shot learning on the Omniglot dataset and demonstrate, for the first time, life-long one-shot learning in recurrent neural networks on a large-scale machine translation task.

研究动机与目标

在终身设定中学习罕见事件所带来的挑战，提供动机与解决思路。
提出一个可区分的记忆模块，用于在训练期间更新键值对并存储。
在推理阶段通过对记忆键的最近邻检索实现一次性学习。
通过将模块集成到 CNN、Seq2Seq 和 GNMT 中，展示其多样性，并在 Omniglot、合成任务和翻译任务上进行评估。

提出的方法

记忆模块将键 K、值 V、年龄 A 作为大小为 memory-size 的记忆 M 存储。
查询 q（归一化）通过余弦相似度检索最近的 k=256 个邻居，返回最近邻的 V 以及一个基于 softmax 权重的相似性信号。
记忆损失使用基于边距的三元组目标，比较正负邻居，促使 q 与正确键更接近、与错误键分离。
记忆更新：若检索到的值与目标 v 匹配，则通过与 q 的平均来更新键；否则将 (q,v) 写入最旧的记忆槽（带有小的随机扰动）。
对于大规模记忆，使用 QK^T 的精确计算或通过局部敏感哈希（LSH）进行近似的高效最近邻。
在不同架构中应用：简单的 CNN、GNMT 风格的 seq2seq，以及 Extended Neural GPU，展示广泛的兼容性。

实验结果

研究问题

RQ1一个可微分、可扩展的记忆模块是否能够在不同神经网络架构中实现终身的一次性学习？
RQ2整合记忆是否在标准的一次性任务（Omniglot）和合成的终身任务上提高性能，并且是否能在大规模翻译中提供帮助？
RQ3当出现罕见事件或单词时，记忆如何影响学习与泛化？
RQ4在翻译及其他序列任务中，评估一次性、终身学习的实际效果与指标有哪些？

主要发现

带记忆的模型在 Omniglot 上实现了强力的一次性学习，接近或达到最新结果。
在需要记忆的合成任务中，带记忆的模型显著优于基线和标准 seq2seq 模型。
在 GNMT 英-德翻译中，带记忆的模型在与基线 BLEU 分数相当的情况下，当使用上下文记忆时显示出一次性提升；将整个测试集暴露为记忆上下文可带来显著的 BLEU 提升（8 点以上）。
Qualitative 示例显示记忆模块能够翻译像 Dostoevsky 这样的罕见词，而基线模型较难翻译。
在多种架构与任务中，单一的记忆参数集（k=256, α=0.1）就能取得良好结果，体现了模块的通用性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。