QUICK REVIEW

[论文解读] A Unified MRC Framework for Named Entity Recognition

Xiaoya Li, Jingrong Feng|arXiv (Cornell University)|Oct 25, 2019

Topic Modeling参考文献 52被引用 51

一句话总结

该论文将平面NER和嵌套NER重新表述为机器阅读理解任务，使用每种实体类型的自然语言查询来提取片段，在嵌套和平面NER数据集上均达到最先进水平。

ABSTRACT

The task of named entity recognition (NER) is normally divided into nested NER and flat NER depending on whether named entities are nested or not. Models are usually separately developed for the two tasks, since sequence labeling models, the most widely used backbone for flat NER, are only able to assign a single label to a particular token, which is unsuitable for nested NER where a token may be assigned several labels. In this paper, we propose a unified framework that is capable of handling both flat and nested NER tasks. Instead of treating the task of NER as a sequence labeling problem, we propose to formulate it as a machine reading comprehension (MRC) task. For example, extracting entities with the extsc{per} label is formalized as extracting answer spans to the question "{\it which person is mentioned in the text?}". This formulation naturally tackles the entity overlapping issue in nested NER: the extraction of two overlapping entities for different categories requires answering two independent questions. Additionally, since the query encodes informative prior knowledge, this strategy facilitates the process of entity extraction, leading to better performances for not only nested NER, but flat NER. We conduct experiments on both {\em nested} and {\em flat} NER datasets. Experimental results demonstrate the effectiveness of the proposed formulation. We are able to achieve vast amount of performance boost over current SOTA models on nested NER datasets, i.e., +1.28, +2.55, +5.44, +6.37, respectively on ACE04, ACE05, GENIA and KBP17, along with SOTA results on flat NER datasets, i.e.,+0.24, +1.95, +0.21, +1.49 respectively on English CoNLL 2003, English OntoNotes 5.0, Chinese MSRA, Chinese OntoNotes 4.0.

研究动机与目标

在单一框架中解决嵌套（重叠）与平面 NER。
利用机器阅读理解通过查询将实体类别的先验知识注入系统。
使用端到端可训练的模型提高对嵌套和平面 NER 数据集的提取准确性。
在多样基准上展示对现有最先进模型的显著经验提升。
分析查询构造和数据效率对 NER 性能的影响。

提出的方法

将 NER 表述为 SQuAD 风格的 MRC 任务，为每个实体类型 y 指定一个自然语言查询 q_y，并从上下文 X 中提取作为答案的片段。
以 BERT 作为骨干网络，对拼接后的查询和上下文进行编码，生成用于片段提取的标记表示。
采用带有两个二分类器的起始/结束片段选择方案，以预测可能的起始和结束，允许每个查询产生多个片段。
训练一个额外的起始-结束匹配分类器，将预测的起始和结束配对为有效的实体片段，最小化一个综合损失。
在预训练的 BERT 表示下联合训练 L_start、L_end 和 L_span，使端到端优化成为可能。
从标注指南生成查询（并探索替代方案），以编码关于实体类别的先验知识。

实验结果

研究问题

RQ1将 NER 框架化为统一的 MRC 问题，是否能够在不使用独立模型的情况下同时处理平面和嵌套 NER？
RQ2将带有先验知识的自然语言查询引入是否能提升提取，特别是对重叠实体？
RQ3查询构造策略对 NER 性能和数据效率有什么影响？
RQ4在零样本设置下，BERT-MRC 方法对未见标签集的迁移能力有多好？
RQ5MRC 形式化与预训练相比，对性能提升的贡献各自有多大？

主要发现

Model	ACE04 Precision	ACE04 Recall	ACE04 F1	ACE05 Precision	ACE05 Recall	ACE05 F1	GENIA Precision	GENIA Recall	GENIA F1	KBP17 Precision	KBP17 Recall	KBP17 F1
Hyper-Graph	73.6	71.8	72.7	70.6	70.4	70.5	77.7	71.8	74.6	76.2	73.0	72.8
Seg-Graph	78.0	72.4	75.1	76.8	72.3	74.5	78.07	76.45	77.25	-	-	-
ARN	-	-	-	-	-	-	-	-	-	-	-	-
Path-BERT	-	-	-	82.98	82.42	82.70	78.07	76.45	77.25	-	-	-
DYGIE	-	-	-	-	-	-	-	-	-	-	-	-
Seq2seq-BERT	-	-	84.40	-	-	-	-	-	-	-	-	-
BERT-MRC	85.05	86.32	85.98	87.16	86.59	86.88	85.18	81.12	83.75	82.33	77.61	80.97

BERT-MRC 在嵌套 NER 数据集 ACE04、ACE05、GENIA、KBP17 上达到最先进或接近 SOTA 的结果，且相较于以往模型有显著的 F1 提升。
在嵌套 NER 上，BERT-MRC 的 F1 分数分别为 85.98（ACE04），86.88（ACE05），83.75（GENIA），80.97（KBP17），分别超越前一代 SOTA +1.28、+2.55、+5.44、+6.37。
在平面 NER 上，BERT-MRC 在英语 CoNLL-2003 的 F1 提升了 +0.24，在英语 OntoNotes 5.0 提升了 +1.95，在中文 MSRA 提升了 +0.21，在中文 OntoNotes 4.0 提升了 +1.49。
零样本实验表明 BERT-MRC 能比基于标签的基线更好地泛化到未见标签，但绝对性能仍低于已见标签的标注。
查询构造会影响性能；注释指南条目在 F1 方面达到最高，而同义词和基于关键词的变体也比简单模板有改进。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。