QUICK REVIEW

[论文解读] Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning

Sandeep Subramanian, Adam Trischler|PolyPublie (École Polytechnique de Montréal)|Mar 30, 2018

Topic Modeling参考文献 45被引用 101

一句话总结

论文提出一个大规模的多任务学习框架，在不同句子级任务（机器翻译、解析、跳跃思维 skip-thoughts、自然语言推理）之间共享一个编码器，以生成通用的固定长度句子表征，能较好迁移到未见任务，在低资源设置下表现良好。

ABSTRACT

A lot of the recent success in natural language processing (NLP) has been driven by distributed vector representations of words trained on large amounts of text in an unsupervised manner. These representations are typically used as general purpose features for words across a range of NLP problems. However, extending this success to learning representations of sequences of words, such as sentences, remains an open problem. Recent work has explored unsupervised as well as supervised learning techniques with different training objectives to learn general purpose fixed-length sentence representations. In this work, we present a simple, effective multi-task learning framework for sentence representations that combines the inductive biases of diverse training objectives in a single model. We train this model on several data sources with multiple training objectives on over 100 million sentences. Extensive experiments demonstrate that sharing a single recurrent sentence encoder across weakly related tasks leads to consistent improvements over previous methods. We present substantial improvements in the context of transfer learning and low-resource settings using our learned general-purpose representations.

研究动机与目标

激励并构建通用的、固定长度的句子表征。
结合多样的训练目标以获得稳健的句子编码。
在新任务和低资源设置中展示迁移学习的改进。

提出的方法

使用一个单一的双向GRU编码器，在多个序列到序列和分类任务中共享。
采用一对多的多任务训练，覆盖来自英法、英德、skip-thoughts、解析和NLI的1.24亿对句子。
在不使用注意力的情况下，将解码器条件于编码器表示，以获得固定长度的句子向量。
综合多种目标：多语言NMT、成分句法分析、skip-thoughts和自然语言推理。
使用简单的统一任务切换方案进行训练，偶尔交错NLI的小批次。

实验结果

研究问题

RQ1在弱相关任务之间共用一个编码器是否能提升通用句子表示？
RQ2多样化任务如何促成在表示中编码不同的语言特性（句法、含义、长度）？
RQ3所学习的表示是否能迁移到未见任务并在低资源数据条件下表现良好？
RQ4与标准基准上的已有词嵌入相比，所学习的词嵌入表现如何？

主要发现

模型	MR	CR	SUBJ	MPQA	SST	TREC	MRPC	SICK-R	SICK-E	STSB	Δ
Our Models +STN +Fr +De +NLI +L	81.7	87.3	94.2	90.8	84.0	94.2	77.1/83.0	0.887	87.1	78.7/78.2	1.33
Our Models +STN +Fr +De +NLI +L +STP	82.7	88.0	94.1	91.2	84.5	92.4	77.8/83.9	0.885	86.8	78.7/78.4	1.44
Our Models +STN +Fr +De +NLI +L +STP +Par	82.5	87.7	94.0	90.9	83.2	93.0	78.6/84.4	0.888	87.8	78.9/78.6	1.48
+STN +Fr +De +NLI +L	81.2	86.4	93.4	90.8	84.0	93.2	76.6/82.7	0.884	87.0	79.2/79.1	0.99
+STN +Fr +De +NLI +2L +STP	82.8	88.3	94.0	91.3	83.6	92.6	77.4/83.3	0.884	87.6	79.2/79.1	1.47
+STN +Fr +De +NLI +L +STP +Par	82.4?	87.7?	94.0?	90.9?	83.2?	93.0?	78.9/78.6?	0.888?	87.8?	78.9/78.6?	1.48

具有多样目标的多任务训练在迁移性能方面优于先前的固定表征。
增加容量（更多隐藏单元）和附加层在若干任务上进一步提升迁移收益。
我们的表征在情感任务上相对Infersent提升1.1–2.0%，在TREC和MRPC迁移任务上获得显著提升。
包含解析和多语言NMT提升句法和蕴涵迁移信号，而仅NLI编码句法，但从额外任务中获益较少。
在低资源情形下，使用线性分类器在我们的表征上的性能可超过某些使用更多数据训练的任务特定模型，例如在Quora数据集上仅用6%的标注数据。
在标准基准上，我们框架中学习的词嵌入与流行的预训练嵌入具有竞争力。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。