QUICK REVIEW

[论文解读] Downstream Model Design of Pre-trained Language Model for Relation Extraction Task

Cheng Li, Ye Tian|arXiv (Cornell University)|Apr 8, 2020

Topic Modeling参考文献 31被引用 32

一句话总结

本论文设计了基于PLM的下游关系抽取模型，使用独立的头部/尾部嵌入，非对称核内积来计算关系趋势，以及基于Sigmoid的多标签损失以处理重叠/多关系。

ABSTRACT

Supervised relation extraction methods based on deep neural network play an important role in the recent information extraction field. However, at present, their performance still fails to reach a good level due to the existence of complicated relations. On the other hand, recently proposed pre-trained language models (PLMs) have achieved great success in multiple tasks of natural language processing through fine-tuning when combined with the model of downstream tasks. However, original standard tasks of PLM do not include the relation extraction task yet. We believe that PLMs can also be used to solve the relation extraction problem, but it is necessary to establish a specially designed downstream task model or even loss function for dealing with complicated relations. In this paper, a new network architecture with a special loss function is designed to serve as a downstream model of PLMs for supervised relation extraction. Experiments have shown that our method significantly exceeded the current optimal baseline models across multiple public datasets of relation extraction.

研究动机与目标

动机及解决使用PLMs时现有关系抽取方法的局限性。
提出一种下游架构，利用PLMs进行关系抽取，具有专门的表示与损失。
实现对同一句子中的多关系与重叠关系的预测。

提出的方法

使用预训练语言模型（BERT）获取标记嵌入，并用CLS上下文信息进行扩充。
从不同的BERT层提取两个以实体为中心的嵌入（head和tail）以捕捉关系线索。
在head和tail嵌入之间计算非对称核内积，以形成每种关系类型的关系倾向分数矩阵。
对每个标记对应用Sigmoid激活以获得概率，并通过实体掩码在实体对上聚合以产生关系概率。
对每种关系类型，在被遮罩的实体对上对二元交叉熵损失取平均，再对关系类型求和以得到最终损失。
可选地整合NER组件（Bi-LSTM/CRF）以形成联合抽取模型，尽管这不是本文的重点。

实验结果

研究问题

RQ1是否通过专门设计的下游任务模型和损失函数使PLMs能够处理关系抽取中的复杂关系？
RQ2对实体表示进行分解并使用非对称的关系核是否可提高对重叠/多关系数据的消歧？
RQ3在标准数据集（SemEval、NYT、WebNLG）上，与同期基线相比，所提出的基于PLM的下游方法在复杂关系场景下的表现如何？

主要发现

与若干BERT后基线相比，所提出的方法在SemEval、NYT和WebNLG上达到最先进的Micro-F1分数。
在SemEval上，模型达到Micro-F1 91.0（All），超过最佳基线89.5。
在NYT上，模型达到Micro-F1 89.8（All），超过最佳基线87.5。
在WebNLG上，模型达到Micro-F1 96.3（All），超过最佳基线88.8。
在面对复杂的重叠关系（EPO）和多关系时，模型仍然表现稳健，在大多数场景中相对基线有显著提升。
该架构允许在单句中进行多关系预测，包括同一实体对之间的重叠关系。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。