[论文解读] Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning
本文提出一个多任务学习框架,在多种句子级任务(multilingual NMT、成分句法分析、skip-thought、和 natural language inference)之间共享一个单一的循环编码器,以产生通用的固定长度句子表示,这些表示能够较好地迁移到新任务和数据稀缺情境。
A lot of the recent success in natural language processing (NLP) has been driven by distributed vector representations of words trained on large amounts of text in an unsupervised manner. These representations are typically used as general purpose features for words across a range of NLP problems. However, extending this success to learning representations of sequences of words, such as sentences, remains an open problem. Recent work has explored unsupervised as well as supervised learning techniques with different training objectives to learn general purpose fixed-length sentence representations. In this work, we present a simple, effective multi-task learning framework for sentence representations that combines the inductive biases of diverse training objectives in a single model. We train this model on several data sources with multiple training objectives on over 100 million sentences. Extensive experiments demonstrate that sharing a single recurrent sentence encoder across weakly related tasks leads to consistent improvements over previous methods. We present substantial improvements in the context of transfer learning and low-resource settings using our learned general-purpose representations.
研究动机与目标
- Motivate the need for general-purpose sentence representations beyond word embeddings.
- Propose a simple, scalable multi-task framework that combines diverse sentence-level training objectives.
- Show that shared encoding across weakly related tasks improves transfer performance and low-resource learning.
提出的方法
- Use a one-to-many sequence-to-sequence model with a shared bidirectional GRU encoder and task-specific decoders.
- Train on diverse objectives: skip-thought vectors, multilingual neural machine translation (NMT), constituency parsing, and natural language inference (NLI).
- Condition decoders on the encoder representation h_x without using attention, enabling a single fixed-length sentence embedding.
- Interleave tasks during training (uniform task sampling; occasional NLI minibatches) and optimize with Adam.
- Evaluate representations by training a simple linear classifier on transfer tasks without updating the encoder parameters.
实验结果
研究问题
- RQ1Does a single shared encoder trained on multiple sentence-level tasks learn more generalizable representations than task-specific or single-objective models?
- RQ2Do diverse inductive biases from multiple tasks improve transfer performance, especially in low-resource settings?
- RQ3Which tasks contribute most to capturing syntax, semantics, or other sentence characteristics?
- RQ4How do fixed-length representations compare to attention-based or task-specific representations on transfer tasks?
主要发现
- Representations learned via multi-task training generalize better across transfer tasks than several prior general-purpose methods.
- Adding more tasks and increasing encoder capacity yields consistent transfer gains on sentiment, entailment, and paraphrase tasks.
- The multi-task model improves low-resource transfer performance, achieving competitive results with only ~6% of labeled data on some tasks.
- Incorporating constituency parsing and multilingual NMT biases enhances syntactic and related linguistic signals in the embeddings.
- The learned word embeddings from the model are competitive with established embedding methods despite being trained from scratch.
- Probing shows that multi-task signals contribute to encoding syntax when parsing and multilingual translation are included, while NLI primarily supports semantic encoding.
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。