QUICK REVIEW

[论文解读] ERNIE 2.0: A Continual Pre-training Framework for Language Understanding

Yu Sun, Shuohuan Wang|arXiv (Cornell University)|Jul 29, 2019

Topic Modeling参考文献 30被引用 74

一句话总结

ERNIE 2.0 将持续预训练扩展为实现对词汇、句法和语义知识的增量学习，在 GLUE 英文任务和中文数据集上相较 BERT 和 XLNet 获得改进。

ABSTRACT

Recently, pre-trained models have achieved state-of-the-art results in various language understanding tasks, which indicates that pre-training on large-scale corpora may play a crucial role in natural language processing. Current pre-training procedures usually focus on training the model with several simple tasks to grasp the co-occurrence of words or sentences. However, besides co-occurring, there exists other valuable lexical, syntactic and semantic information in training corpora, such as named entity, semantic closeness and discourse relations. In order to extract to the fullest extent, the lexical, syntactic and semantic information from training corpora, we propose a continual pre-training framework named ERNIE 2.0 which builds and learns incrementally pre-training tasks through constant multi-task learning. Experimental results demonstrate that ERNIE 2.0 outperforms BERT and XLNet on 16 tasks including English tasks on GLUE benchmarks and several common tasks in Chinese. The source codes and pre-trained models have been released at https://github.com/PaddlePaddle/ERNIE.

研究动机与目标

通过在预训练中利用语料库中的词汇、句法和语义信息，推动超越简单共现的预训练需求。
提出一个持续的多任务预训练框架（ERNIE 2.0），以增量方式构建并学习多样的预训练任务。
在英语 GLUE 基准和各种中文 NLP 任务上，展示相对于 BERT 和 XLNet 的改进。

提出的方法

使用自监督或弱监督信号从大型语料库构建面向词汇感知、结构感知和语义感知的预训练任务。
采用带任务嵌入的共享 Transformer 编码器，以实现跨任务的知识转移。
实现持续的多任务学习，在每个任务的若干训练迭代中更新模型以平衡效率和遗忘，同时保留已学知识。
在 Transformer 框架中使用 [CLS] 标记和 [SEP] 分隔符，并通过任务嵌入区分任务。
对下游任务如问答、自然语言推断和语义相似度对预训练的 ERNIE 2.0 进行微调。

实验结果

研究问题

RQ1利用词汇、句法和语义信号的持续多任务预训练是否能提供比单任务预训练更好的语言表征？
RQ2我们如何在持续方式中训练多任务预训练任务，同时在保持高效的同时不遗忘已学知识？
RQ3相较于 BERT 和 XLNet，ERNIE 2.0 的表示在标准英文基准（GLUE）和中文 NLP 任务上是否取得更优绩效？

主要发现

ERNIE 2.0 在 16 项任务上优于 BERT 和 XLNet，包括英语 GLUE 基准和若干中文任务。
英文 GLUE 结果显示 ERNIE 2.0 LARGE 在大多数任务上超过 BERT LARGE 和 XLNet LARGE，达到 GLUE 测试分数 83.6，比前一代 BERT LARGE 提升 3.1%。
在中文任务上，ERNIE 2.0 LARGE 在九个任务中表现最佳，ERNIE 1.0 BASE 已在某些任务上优于 BERT，ERNIE 2.0 进一步提升结果。
持续的多任务学习策略在多任务从零开始的学习和传统的持续学习之间表现更优，展示了有效的知识保留与任务自适应。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。