QUICK REVIEW

[论文解读] MPNet: Masked and Permuted Pre-training for Language Understanding

Kaitao Song, Xu Tan|arXiv (Cornell University)|Apr 20, 2020

Topic Modeling参考文献 26被引用 503

一句话总结

MPNet 统一 MLM (BERT) 与 PLM (XLNet) 通过建模预测标记之间的依赖性并使用完整句子位置信息，在 GLUE、SQuAD 等基准上取得显著提升。

ABSTRACT

BERT adopts masked language modeling (MLM) for pre-training and is one of the most successful pre-training models. Since BERT neglects dependency among predicted tokens, XLNet introduces permuted language modeling (PLM) for pre-training to address this problem. However, XLNet does not leverage the full position information of a sentence and thus suffers from position discrepancy between pre-training and fine-tuning. In this paper, we propose MPNet, a novel pre-training method that inherits the advantages of BERT and XLNet and avoids their limitations. MPNet leverages the dependency among predicted tokens through permuted language modeling (vs. MLM in BERT), and takes auxiliary position information as input to make the model see a full sentence and thus reducing the position discrepancy (vs. PLM in XLNet). We pre-train MPNet on a large-scale dataset (over 160GB text corpora) and fine-tune on a variety of down-streaming tasks (GLUE, SQuAD, etc). Experimental results show that MPNet outperforms MLM and PLM by a large margin, and achieves better results on these tasks compared with previous state-of-the-art pre-trained methods (e.g., BERT, XLNet, RoBERTa) under the same model setting. The code and the pre-trained models are available at: https://github.com/microsoft/MPNet.

研究动机与目标

通过解决 MLM 的局限性（独立的标记预测）和 PLM 的局限性（缺乏完整句子位置信息）来推动预训练的改进。
开发一个预训练目标，在利用预测标记之间的依赖关系的同时，结合完整句子位置信息。
证明 MPNet 在多种 NLP 基准上相较于 BERT、XLNet、RoBERTa 和 ELECTRA 能带来显著提升。

提出的方法

引入 MPNet 目标：对置换序列最大化 P(x_z_t | x_z_<t, M_z>c; θ)。
使用双流自注意力机制来建模预测标记之间的输出依赖。
应用位置补偿，使查询流/内容流在预训练期间看到完整的句子信息。
输入设计：将非预测标记与对预测部分进行掩码的标记，以及原始序列的排列拼接。
在大规模语料库（约 160GB）上进行训练，并在下游任务（GLUE、SQuAD、RACE、IMDB）上进行微调。
在相同模型设置下，将 MPNet 与 MLM、PLM，以及强基线（BERT、XLNet、RoBERTa）进行比较。

实验结果

研究问题

RQ1是否可以在预训练阶段利用预测标记之间的依赖关系（输出依赖）来改进表示，以超越 MLM？
RQ2将完整句子位置信息纳入是否降低相对于 PLM 的预训练和微调之间的差异？
RQ3与先前的预训练方法相比，MPNet 在标准基准（GLUE、SQuAD、RACE、IMDB）上的表现如何？
RQ4位置补偿和置换机制在 MPNet 中的经验影响是什么？

主要发现

在相同基础模型设置下，MPNet 在 GLUE 开发集上显著优于 MLM 和 PLM。
在所报道的实验中，MPNet 在 GLUE 基准上取得的结果优于 BERT、XLNet 和 RoBERTa。
在 SQuAD v1.1 和 v2.0 上，MPNet 在报告度量上超越了 BERT、XLNet 和 RoBERTa。
在 16GB 数据上进行预训练时，MPNet 在 RACE 和 IMDB 上显示出强劲的结果；在 160GB 的更大规模预训练时获得更大的增益。
消融研究证实位置补偿和输出依赖对 MPNet 性能的重要性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。