QUICK REVIEW

[论文解读] Incorporating Copying Mechanism in Sequence-to-Sequence Learning

Jiatao Gu, Zhengdong Lu|arXiv (Cornell University)|Mar 21, 2016

Topic Modeling参考文献 15被引用 258

一句话总结

CopyNet 在 seq2seq 中扩展了一个可微分的复制机制，能够从输入中复制子序列，从而在合成模式学习、文本摘要和单轮对话等任务上提升性能。

ABSTRACT

We address an important problem in sequence-to-sequence (Seq2Seq) learning referred to as copying, in which certain segments in the input sequence are selectively replicated in the output sequence. A similar phenomenon is observable in human language communication. For example, humans tend to repeat entity names or even long phrases in conversation. The challenge with regard to copying in Seq2Seq is that new machinery is needed to decide when to perform the operation. In this paper, we incorporate copying into neural network-based Seq2Seq learning and propose a new model called CopyNet with encoder-decoder structure. CopyNet can nicely integrate the regular way of word generation in the decoder with the new copying mechanism which can choose sub-sequences in the input sequence and put them at proper places in the output sequence. Our empirical study on both synthetic data sets and real world data sets demonstrates the efficacy of CopyNet. For example, CopyNet can outperform regular RNN-based model with remarkable margins on text summarization tasks.

研究动机与目标

在 Seq2Seq 任务中说明对输入子序列（如实体名、日期）的准确复制的必要性。
提出一个统一的编码器–解码器模型（CopyNet），在一个可微分框架中同时整合生成与复制。
在合成、摘要和对话数据集上展示 CopyNet 的有效性。
展示通过利用源端内容进行复制可以改善对未登录词（OOV）的处理。

提出的方法

引入具有编码器–解码器架构的 CopyNet，以及将生成模式和复制模式结合的混合预测模型。
使用源隐藏状态来定义复制模式分数以选择要复制的输入子序列（公式 6）。
使用带参数化词汇表分数的标准解码器输出来定义生成模式分数（公式 7）。
在两种模式之间计算共享的归一化因子 Z，以形成生成与复制之间基于 softmax 的竞争（公式 4–6）。
结合混合寻址机制，对源记忆 M 进行带有注意力读取（基于内容）和选择性读取（基于位置）的操作（第 3.3–3.4 节）。
更新解码器状态，使其包含前一单词嵌入和一个对位置感知的选择性读取向量，以指导后续步骤（公式 9）。
端到端训练，通过最小化负对数似然实现，无需额外的模式标签（公式 10）。

实验结果

研究问题

RQ1可微分的复制机制是否能提升在需要忠实重现输入片段的 Seq2Seq 模型的性能？
RQ2CopyNet 如何在复制与生成之间取得平衡，以及如何通过源端复制处理 OOV 单词？
RQ3与带/不带注意力的标准编码器–解码器模型相比，CopyNet 是否在合成、摘要和对话数据集上提升性能？
RQ4混合（基于内容和基于位置）的寻址在实现有效复制中的作用是？

主要发现

CopyNet 在合成复制任务上显著超过标准 Enc-Dec 和 RNNSearch（表 1）。
在 LCSTS 中文摘要任务中，CopyNet 的 ROUGE 得分高于基线，+C 与 +W 变体显示出显著提升（表 3）。
在单轮对话中，CopyNet 的 Top-1 和 Top-10 解码准确率高于 RNNSearch，尤其是测试数据与训练子串不重叠时（表 4）。
CopyNet 能从源端复制较长的 OOV 子序列，缓解在抽象任务（文本摘要和对话场景）中的开放词汇问题。
该模型展示了复制模式与生成模式之间的精确协调，常复制连续的输入片段并插入生成内容以形成流畅输出（图中的案例研究）。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。