QUICK REVIEW

[论文解读] Deep Encoder, Shallow Decoder: Reevaluating Non-autoregressive Machine Translation

Jungo Kasai, Nikolaos Pappas|arXiv (Cornell University)|Jun 18, 2020

Natural Language Processing Techniques参考文献 53被引用 64

一句话总结

论文认为，具有深编码器和浅解码器的自回归模型在相似速度下可以超越强的非自回归模型；并且传统的NAR评估由于层分配、速度测量和蒸馏做法，低估了AR的加速。

ABSTRACT

Much recent effort has been invested in non-autoregressive neural machine translation, which appears to be an efficient alternative to state-of-the-art autoregressive machine translation on modern GPUs. In contrast to the latter, where generation is sequential, the former allows generation to be parallelized across target token positions. Some of the latest non-autoregressive models have achieved impressive translation quality-speed tradeoffs compared to autoregressive baselines. In this work, we reexamine this tradeoff and argue that autoregressive baselines can be substantially sped up without loss in accuracy. Specifically, we study autoregressive models with encoders and decoders of varied depths. Our extensive experiments show that given a sufficiently deep encoder, a single-layer autoregressive decoder can substantially outperform strong non-autoregressive models with comparable inference speed. We show that the speed disadvantage for autoregressive baselines compared to non-autoregressive methods has been overestimated in three aspects: suboptimal layer allocation, insufficient speed measurement, and lack of knowledge distillation. Our results establish a new protocol for future research toward fast, accurate machine translation. Our code is available at https://github.com/jungokasai/deep-shallow.

研究动机与目标

质疑传统NAR的速度-精度权衡与评估实践。
研究编码器/解码器深度分配对AR与NAR性能的影响。
评估知识蒸馏在公平比较下对AR和NAR基线的影响。
提供评估快速、准确的机器翻译模型的新协议。

提出的方法

系统性比较自回归（AR）与非自回归（NAR）模型，使用不同的编码器/解码器深度。
为AR和NAR引入并评估深编码器-浅解码器配置。
使用两种度量衡量推理速度：S1（一次处理一个句子）和Smax（硬件上的最大批量大小）。
对AR和NAR基线应用序列级知识蒸馏以实现公平比较。
分析复杂度并讨论解码迭代次数（T）对总计算量和速度的影响。
在多个WMT方向上进行大规模实验，采用标准预处理和评估（BLEU，SacreBLEU）。

实验结果

研究问题

RQ1深编码器配合浅解码器是否为AR提供比NAR更好的速度-质量权衡？
RQ2速度测量（S1 与 Smax）如何影响对AR相对于NAR的感知优势？
RQ3编码器/解码器层分配对翻译质量与解码速度的影响？
RQ4为确保公平比较，知识蒸馏做法是否需要同等应用于AR和NAR？
RQ5在不牺牲准确性的前提下，AR模型能在多大程度上加速，相对于强大的NAR方法？

主要发现

使用深编码器、浅解码器的AR在BLEU方面与强力的6-6 AR基线相当，但S1解码显著更快。
在深编码器–浅解码器配置下，NAR模型通常BLEU落后于AR，并且Smax性能也慢于AR基线。
在大批量解码下，AR加速仍然稳健，而NAR的加速随着批量增大而减小。
知识蒸馏对两者都有益，但AR与NAR之间的准确度差距仍然很大，且在对两者都应用蒸馏时差距进一步扩大。
词序重排序和解码器层数是推动NAR需要更高解码深度以获得良好性能的关键因素。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。