Skip to main content
QUICK REVIEW

[论文解读] GPT-NER: Named Entity Recognition via Large Language Models

Shuhe Wang, Xiaofei Sun|arXiv (Cornell University)|Apr 20, 2023
Topic Modeling被引用 145
一句话总结

GPT-NER 将 NER 重新框架为对 LLM 的文本生成任务,使用带自我验证的实体标记输出以降低幻觉,并实现与有监督基线相竞争的结果,尤其在低资源环境中。

ABSTRACT

Despite the fact that large-scale Language Models (LLM) have achieved SOTA performances on a variety of NLP tasks, its performance on NER is still significantly below supervised baselines. This is due to the gap between the two tasks the NER and LLMs: the former is a sequence labeling task in nature while the latter is a text-generation model. In this paper, we propose GPT-NER to resolve this issue. GPT-NER bridges the gap by transforming the sequence labeling task to a generation task that can be easily adapted by LLMs e.g., the task of finding location entities in the input text "Columbus is a city" is transformed to generate the text sequence "@@Columbus## is a city", where special tokens @@## marks the entity to extract. To efficiently address the "hallucination" issue of LLMs, where LLMs have a strong inclination to over-confidently label NULL inputs as entities, we propose a self-verification strategy by prompting LLMs to ask itself whether the extracted entities belong to a labeled entity tag. We conduct experiments on five widely adopted NER datasets, and GPT-NER achieves comparable performances to fully supervised baselines, which is the first time as far as we are concerned. More importantly, we find that GPT-NER exhibits a greater ability in the low-resource and few-shot setups, when the amount of training data is extremely scarce, GPT-NER performs significantly better than supervised models. This demonstrates the capabilities of GPT-NER in real-world NER applications where the number of labeled examples is limited.

研究动机与目标

  • 通过将 NER 重新框架为一个生成任务来弥合 NER 与 LLM 生成之间的差距。
  • 设计提示和检索策略,为 LLM 提供结构良好的示例演示。
  • 通过自我验证步骤在 NER 中缓解 LLM 的幻觉。
  • 在平坦和嵌套的 NER 基准上评估 GPT-NER,并分析低资源下的性能。

提出的方法

  • 通过用特殊标记 @@ 和 ## 将实体包围,将 NER 转换为文本生成任务,从而生成带标签的序列。
  • 用任务描述、少样例演示和输入句子段落来构建提示,以引导 LLM。
  • 使用 token 级最近邻检索来获取演示示例,以提供相关示例。
  • 引入自我验证步骤,在最终输出前模型检查提取的实体是否属于目标标签。
  • 以 GPT-3 (davinci-003) 作为骨干模型,使用固定的生成设置,并在标准 NER 数据集上进行评估。
Figure 1: The example of the prompt of GPT-NER. Suppose that we need to recognize location entities for the given sentence: China says Taiwan spoils atmosphere for talks . The prompt consists of three parts: (1) Task Description : It’s surrounded by a red rectangle, and instructs the GPT-3 model tha
Figure 1: The example of the prompt of GPT-NER. Suppose that we need to recognize location entities for the given sentence: China says Taiwan spoils atmosphere for talks . The prompt consists of three parts: (1) Task Description : It’s surrounded by a red rectangle, and instructs the GPT-3 model tha

实验结果

研究问题

  • RQ1带标记输出的 GPT 风格生成是否能在平坦和嵌套数据集上与有监督 NER 基线竞争?
  • RQ2在 token 级 kNN 演示检索下,是否比随机或句子级检索能提升 NER 性能?
  • RQ3自我验证步骤是否能减少幻觉并提高 NER 输出的准确性?
  • RQ4在低资源和少样本场景中,GPT-NER 与有监督模型相比的表现如何?
  • RQ5在演示检索中使用实体级嵌入对 NER 任务的影响是什么?

主要发现

  • GPT-NER 在平坦 NER 数据集上达到与有监督基线相当的性能,在若干设置上接近 SOTA。
  • 实体级(令牌感知)kNN 检索在演示示例方面显著优于随机与句子级检索。
  • 自我验证通过缓解对 NULL 标签的过度自信标注带来额外收益并提升 F1 分数。
  • GPT-NER 在低资源和少样本设置中显示出强大的优势,当标注数据稀缺时超过有监督模型。
  • 在更大令牌预算下性能提升仍然存在,表明使用更高容量的 LLM(如 GPT-4)还有改进空间。
Figure 2: An example of the approach entity-level embedding to retrieve few-shot demonstrations. Supposed that we need to retrieve few-shot demonstrations for the input sentence “ Obama lives in Washington ” with the defined LOC entity in the prompt. Step 1 Datastore Construction : We first use the
Figure 2: An example of the approach entity-level embedding to retrieve few-shot demonstrations. Supposed that we need to retrieve few-shot demonstrations for the input sentence “ Obama lives in Washington ” with the defined LOC entity in the prompt. Step 1 Datastore Construction : We first use the

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。