[论文解读] Zero-Shot Cross-lingual Classification Using Multilingual Neural Machine Translation
本文重复使用多语言NMT编码器来形成用于跨语言迁移的编码器-分类器,在零样本法对French分类表现强劲,并在Amazon Reviews、SST和SNLI上获得具有竞争力的English任务提升。
Transferring representations from large supervised tasks to downstream tasks has shown promising results in AI fields such as Computer Vision and Natural Language Processing (NLP). In parallel, the recent progress in Machine Translation (MT) has enabled one to train multilingual Neural MT (NMT) systems that can translate between multiple languages and are also capable of performing zero-shot translation. However, little attention has been paid to leveraging representations learned by a multilingual NMT system to enable zero-shot multilinguality in other NLP tasks. In this paper, we demonstrate a simple framework, a multilingual Encoder-Classifier, for cross-lingual transfer learning by reusing the encoder from a multilingual NMT system and stitching it with a task-specific classifier component. Our proposed model achieves significant improvements in the English setup on three benchmark tasks - Amazon Reviews, SST and SNLI. Further, our system can perform classification in a new language for which no classification data was seen during training, showing that zero-shot classification is possible and remarkably competitive. In order to understand the underlying factors contributing to this finding, we conducted a series of analyses on the effect of the shared vocabulary, the training data type for NMT, classifier complexity, encoder representation power, and model generalization on zero-shot performance. Our results provide strong evidence that the representations learned from multilingual NMT systems are widely applicable across languages and tasks.
研究动机与目标
- 证明重复使用多语言NMT编码器可提升下游NLP任务的性能。
- 表明该方法能够在没有任务特定训练数据的语言中实现零样本分类。
- 分析影响零样本性能的因素(共享词汇、数据类型、编码器深度、分类器容量和训练动态)。
提出的方法
- 训练带有语言特定解码器的共享多语言NMT编码器用于英-法翻译,并将其编码器作为预训练组件。
- 附加一个带有预池化、池化和后池化网络的任务特定分类器,以产生用于预测的固定尺寸表示。
- 在英文任务(Amazon Reviews、SST)和SNLI上评估,以衡量多语言编码器带来的迁移提升。
- 对编码器进行冻结与微调的实验,以评估对性能的影响。
- 将该设置扩展到SNLI,采用对前提和假设的多源编码。
- 与最先进的基线和跨语言嵌入方法在零-shot设置下进行对比。
实验结果
研究问题
- RQ1多语言NMT编码器是否能够为下游NLP任务提供可迁移、语言无关的表示?
- RQ2与随机初始化的编码器相比,重复使用编码器是否能提升英语任务的性能?
- RQ3在新语言(如法语)实现无任务特定法训练数据的零样本分类是否可行,且能达到 bridged 设置的接近程度?
- RQ4哪些因素(词汇共享、多语言训练数据、编码器深度、分类器容量、训练动态)对零样本性能影响最大?
主要发现
- 重复使用多语言NMT编码器在Amazon(En/Fr)、SST和SNLI等任务中对随机初始化的编码器带来显著提升。
- 使用预训练编码器时,与基线编码器-分类器相比,在Amazon(En/Fr)、SST和SNLI上的准确率有所提升。
- 冻结编码器在初始化后可以进一步提升性能,特别是对如Amazon Reviews等长文本任务。
- 在零样本法进行法语分类时,预训练编码器显著提升零样本准确率,接近bridged性能(在若干任务上相差仅几分)。
- 在SNLI(Fr)上,最佳零样本方法比若干跨语言嵌入基线有显著优势(例如73.88% vs 较低的基线)。
- 分析表明共享的子词词汇有助于泛化,但要实现强零样本性能需要多语言训练数据;编码器深度和模型容量对学习中间语表示至关重要。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。