QUICK REVIEW

[论文解读] Improving Relation Extraction by Pre-trained Language Representations

Christoph Alt, Marc P. Hübner|arXiv (Cornell University)|Jun 7, 2019

Topic Modeling参考文献 29被引用 53

一句话总结

TRE 使用 Transformer 框架中的预训练语言表示来执行关系抽取，在 TACRED 和 SemEval 2010 Task 8 上达到最先进的结果，并显示出改进的样本效率。

ABSTRACT

Current state-of-the-art relation extraction methods typically rely on a set of lexical, syntactic, and semantic features, explicitly computed in a pre-processing step. Training feature extraction models requires additional annotated language resources, which severely restricts the applicability and portability of relation extraction to novel languages. Similarly, pre-processing introduces an additional source of error. To address these limitations, we introduce TRE, a Transformer for Relation Extraction, extending the OpenAI Generative Pre-trained Transformer [Radford et al., 2018]. Unlike previous relation extraction models, TRE uses pre-trained deep language representations instead of explicit linguistic features to inform the relation classification and combines it with the self-attentive Transformer architecture to effectively model long-range dependencies between entity mentions. TRE allows us to learn implicit linguistic features solely from plain text corpora by unsupervised pre-training, before fine-tuning the learned language representations on the relation extraction task. TRE obtains a new state-of-the-art result on the TACRED and SemEval 2010 Task 8 datasets, achieving a test F1 of 67.4 and 87.1, respectively. Furthermore, we observe a significant increase in sample efficiency. With only 20% of the training examples, TRE matches the performance of our baselines and our model trained from scratch on 100% of the TACRED dataset. We open-source our trained models, experiments, and source code.

研究动机与目标

推动减少对显式语言特征工程在关系抽取中的依赖。
介绍 TRE，一种基于 Transformer 的模型，使用预训练语言表示进行关系分类。
证明无监督预训练在标准基准上提升性能和样本效率。

提出的方法

使用解码器单向的 Transformer 架构来处理用于关系抽取的结构化输入。
采用带有 BPE 子词标记和任务特定分隔符的输入表示来编码关系参数和句子。
在简单文本上对模型进行语言建模目标的预训练，然后在微调阶段对关系抽取进行微调，并使用辅助 LM 目标。
通过使用线性 softmax 分类器从最终的 Transformer 状态预测关系标签来进行微调，必要时对 LM 目标进行加权（lambda）。
尝试实体 masking 策略（UNK、NE、GR、NE+GR），以研究泛化与正则化效应。

实验结果

研究问题

RQ1在没有显式语言特征的情况下，使用语言表示的预训练是否能提升关系抽取的表现？
RQ2TRE 与 TACRED 和 SemEval 2010 Task 8 的最先进模型相比如何？
RQ3实体 masking 对泛化和样本效率的影响是什么？
RQ4在训练数据有限的情况下，TRE 相对于基线的样本效率如何？

主要发现

系统	P	R	F1
LR † Zhang et al. (2017)	72.0	47.8	57.5
CNN † Zhang et al. (2017)	72.1	50.3	59.2
Tree-LSTM † Zhang et al. (2018)	66.0	59.2	62.4
PA-LSTM † Zhang et al. (2018)	65.7	64.5	65.1
C-GCN † Zhang et al. (2018)	69.9	63.3	66.4
TRE (ours)	70.1	65.0	67.4
SVM † Rink and Harabagiu (2010)	–	–	82.2
PA-LSTM † Zhang et al. (2018)	–	–	82.7
C-GCN † Zhang et al. (2018)	–	–	84.8
DRNN † Xu et al. (2016)	–	–	86.1
BRCNN † Cai et al. (2016)	–	–	86.3
PCNN Zeng et al. (2015)	–	–	86.6
TRE (ours)	–	–	87.1 (±0.16)

TRE 在 TACRED (67.4) 与 SemEval 2010 Task 8 (87.1) 上实现了最先进的 F1。
预训练语言表示显著提升了性能，尤其是在实体未被 masking 时，表明正则化的好处。
实体 masking（NE+GR）带来强劲的性能，表明语言表示捕捉到与实体类型和角色信息相似的有信息量的特征。
TRE 展现出显著的样本效率，使用仅 20% 的 TACRED 训练数据就能达到较高的 F1。
未 masking 的实体可能导致过拟合； masking 策略有助于对未见实体的泛化。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。