QUICK REVIEW

[论文解读] Towards Universal Paraphrastic Sentence Embeddings

John Wieting, Mohit Bansal|arXiv (Cornell University)|Nov 25, 2015

Topic Modeling参考文献 64被引用 117

一句话总结

本文提出了一种简单但极为有效的方法，通过平均在释义语料库（PPDB）上训练的词向量来学习通用句子嵌入，其在跨领域文本相似性和蕴含任务上达到了最先进性能。尽管方法简单，该模型在域外数据上的表现优于复杂的LSTM模型，且与特定任务系统相比也毫不逊色，从而在无需神经网络架构的情况下建立了通用句子嵌入的新基线。

ABSTRACT

We consider the problem of learning general-purpose, paraphrastic sentence embeddings based on supervision from the Paraphrase Database (Ganitkevitch et al., 2013). We compare six compositional architectures, evaluating them on annotated textual similarity datasets drawn both from the same distribution as the training data and from a wide range of other domains. We find that the most complex architectures, such as long short-term memory (LSTM) recurrent neural networks, perform best on the in-domain data. However, in out-of-domain scenarios, simple architectures such as word averaging vastly outperform LSTMs. Our simplest averaging model is even competitive with systems tuned for the particular tasks while also being extremely efficient and easy to use. In order to better understand how these architectures compare, we conduct further experiments on three supervised NLP tasks: sentence similarity, entailment, and sentiment classification. We again find that the word averaging models perform well for sentence similarity and entailment, outperforming LSTMs. However, on sentiment classification, we find that the LSTM performs very strongly-even recording new state-of-the-art performance on the Stanford Sentiment Treebank. We then demonstrate how to combine our pretrained sentence embeddings with these supervised tasks, using them both as a prior and as a black box feature extractor. This leads to performance rivaling the state of the art on the SICK similarity and entailment tasks. We release all of our resources to the research community with the hope that they can serve as the new baseline for further work on universal sentence embeddings.

研究动机与目标

开发适用于通用场景的、具有释义性质的句子嵌入，使其在多种自然语言处理领域中能有效迁移。
评估不同组合架构（从简单平均到LSTM）在域内和域外文本相似性任务上的性能表现。
确定简单非神经网络模型是否能在零样本或少样本迁移设置中超越复杂神经网络架构。
证明预训练的句子嵌入可提升下游自然语言处理任务（如相似性、蕴含和情感分类）的性能。
发布一个新且易于访问的通用句子嵌入基线，以加速未来研究。

提出的方法

通过在释义语料库（PPDB）上学习的词向量进行平均来训练句子嵌入，除词向量本身外不引入任何额外的组合参数。
使用paragram-sl999词嵌入的改进版本，通过在PPDB短语对上进行反向传播微调，生成paragram-phrase嵌入。
基于对应paragram-phrase嵌入的L2范数，为每个词向量学习一个乘法权重，以突出重要内容词。
在域内（SICK）和域外（22个SemEval STS）数据集上评估模型，以衡量其迁移能力和鲁棒性。
将预训练的句子嵌入作为先验信息或固定特征提取器，应用于监督模型中以完成相似性、蕴含和情感分类任务。
以句子嵌入之间的余弦相似度作为主要指标，评估释义检测和文本相似性任务。

实验结果

研究问题

RQ1简单的词向量平均模型是否能在跨领域句子相似性迁移任务中超越复杂的神经网络架构（如LSTM）？
RQ2在新闻、推文和图像字幕等多样化领域中，基于释义数据训练的句子嵌入性能如何泛化？
RQ3在平均框架内学习词向量组合性，相较于直接平均预训练词向量，能在多大程度上提升性能？
RQ4通用句子嵌入是否可作为监督NLP任务（如蕴含和情感分类）中的有效先验信息或特征提取器？
RQ5从嵌入范数中导出的词重要性权重在提升句子表示质量方面起到何种作用？

主要发现

尽管模型结构简单，该词向量平均模型在22个SemEval STS数据集上的平均皮尔逊相关系数$r$达到66.83，平均优于LSTM模型16.5个百分点。
paragram-phrase嵌入在2012至2015年所有SemEval STS任务中均位列前25%，并在四个数据集中取得最佳或并列最佳表现。
当与GloVe和paragram-sl999嵌入平均时，该模型性能优于两者，平均分别高出17.1分和12.8分。
通过paragram-phrase向量的L2范数学习到的乘法权重，至少贡献了相对于原始paragram-sl999嵌入64.76%的性能提升。
在斯坦福情感树库上，LSTM模型在粗粒度情感分类任务上达到了89.2%的新最先进准确率，优于平均模型在此特定任务中的表现。
当用作固定特征提取器或先验信息时，预训练的句子嵌入在SICK相似性和蕴含任务上的性能与最先进模型相当。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。