[论文解读] Sequence to Sequence Learning with Neural Networks
一种使用双层 LSTM 编码器-解码器的神经序列到序列模型,在 WMT’14 英法翻译上实现了最先进的 BLEU 分数,在直接翻译上超越基于短语的 SMT 基线,并在与 SMT 输出结合时通过再评分得到改进。
Deep Neural Networks (DNNs) are powerful models that have achieved excellent performance on difficult learning tasks. Although DNNs work well whenever large labeled training sets are available, they cannot be used to map sequences to sequences. In this paper, we present a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure. Our method uses a multilayered Long Short-Term Memory (LSTM) to map the input sequence to a vector of a fixed dimensionality, and then another deep LSTM to decode the target sequence from the vector. Our main result is that on an English to French translation task from the WMT'14 dataset, the translations produced by the LSTM achieve a BLEU score of 34.8 on the entire test set, where the LSTM's BLEU score was penalized on out-of-vocabulary words. Additionally, the LSTM did not have difficulty on long sentences. For comparison, a phrase-based SMT system achieves a BLEU score of 33.3 on the same dataset. When we used the LSTM to rerank the 1000 hypotheses produced by the aforementioned SMT system, its BLEU score increases to 36.5, which is close to the previous best result on this task. The LSTM also learned sensible phrase and sentence representations that are sensitive to word order and are relatively invariant to the active and the passive voice. Finally, we found that reversing the order of the words in all source sentences (but not target sentences) improved the LSTM's performance markedly, because doing so introduced many short term dependencies between the source and the target sentence which made the optimization problem easier.
研究动机与目标
- 展示一个端到端的序列到序列学习方法,将输入序列映射到输出序列,且不依赖强结构假设。
- 证明一个深层 LSTM 编码器-解码器能够直接翻译文本,并通过重评分提升 SMT 的性能。
- 研究提高学习与翻译质量的技术,包括对源句子进行反转以及使用多层结构。
提出的方法
- 用一个深层 LSTM 编码输入序列,以获得一个固定维向量表示。
- 用一个单独的深层 LSTM 对目标序列进行解码,条件依赖于编码表示。
- 使用从左到右的束搜索解码器来生成翻译并计算翻译的 p(T|S)。
- 通过最大化训练数据中正确翻译的对数概率来进行端到端训练。
- 尝试对源句子进行反转,以减少内存滞后并改善优化。
- 在 WMT’14 英法上使用 BLEU 进行评估,包括直接翻译和对 SMT 的 n-best 列表进行再评分。
实验结果
研究问题
- RQ1完全基于神经网络的带 LSTMs 的编码-解码器能否在大规模上执行直接的序列到序列翻译?
- RQ2对源输入进行反转是否会提升学习效率和翻译质量在 seq2seq LSTM 模型中?
- RQ3在大规模任务中,神经序列到序列翻译与传统 SMT 基线相比如何,以及如何互补?
主要发现
| 方法 | test BLEU 分数 (ntst14) |
|---|---|
| Baseline System [29] | 33.30 |
| Cho et al. [5] | 34.54 |
| Single forward LSTM, beam size 12 | 26.17 |
| Single reversed LSTM, beam size 12 | 30.59 |
| Ensemble of 5 reversed LSTMs, beam size 1 | 33.00 |
| Ensemble of 2 reversed LSTMs, beam size 12 | 33.27 |
| Ensemble of 5 reversed LSTMs, beam size 2 | 34.50 |
| Ensemble of 5 reversed LSTMs, beam size 12 | 34.81 |
- 一个深层 LSTM 的集成在 ntst14 的直接翻译达到 34.81 BLEU,超过 SMT 基线的 33.30 BLEU。
- 用一组反向 LSTM 的集成对 SMT 基线的 1000-best 列表进行再评分,达到 36.5 BLEU,接近已发表的最佳 SMT 结果。
- 单个 LSTM 和多种束搜索设置表明神经模型可接近或超过 SMT 性能,特别是在使用反转和集成时。
- 对源句子进行反转在一个设置中显著提升 BLEU(从 25.9 到 30.6)和困惑度(5.8 到 4.7)。
- 完整模型使用 384M 参数,源词汇表 160k,目标词汇表 80k,使用 SGD 和梯度裁剪训练,历时 7.5 个时期。
- 长句子并未降低性能;定性分析表明学习到的表示尊重词序并捕捉意义。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。