Skip to main content
QUICK REVIEW

[论文解读] Relation Extraction : A Survey

Sachin Pawar, Girish Keshav Palshikar|arXiv (Cornell University)|Dec 14, 2017
Topic Modeling参考文献 115被引用 76
一句话总结

一份关于关系抽取(RE)技术的综合综述,涵盖有监督、半监督、无监督方法、开放信息抽取(Open Information Extraction)和远程监督,强调特征与核方法,以及常见的RE数据集如ACE。

ABSTRACT

With the advent of the Internet, large amount of digital text is generated everyday in the form of news articles, research publications, blogs, question answering forums and social media. It is important to develop techniques for extracting information automatically from these documents, as lot of important information is hidden within them. This extracted information can be used to improve access and management of knowledge hidden in large text corpora. Several applications such as Question Answering, Information Retrieval would benefit from this information. Entities like persons and organizations, form the most basic unit of the information. Occurrences of entities in a sentence are often linked through well-defined relations; e.g., occurrences of person and organization in a sentence may be linked through relations such as employed at. The task of Relation Extraction (RE) is to identify such relations automatically. In this paper, we survey several important supervised, semi-supervised and unsupervised RE techniques. We also cover the paradigms of Open Information Extraction (OIE) and Distant Supervision. Finally, we describe some of the recent trends in the RE techniques and possible future research directions. This survey would be useful for three kinds of readers - i) Newcomers in the field who want to quickly learn about RE; ii) Researchers who want to know how the various RE techniques evolved over time and what are possible future research directions and iii) Practitioners who just need to know which RE technique works best in various settings.

研究动机与目标

  • 为关系抽取任务与挑战提供结构化概述。
  • 总结有监督、半监督与无监督的RE方法。
  • 讨论开放信息抽取与远程监督范式。
  • 回顾用于RE的基于核与特征的方法及其评估。
  • 突出数据集与RE未来方向。

提出的方法

  • 将RE技术分为有监督、半监督、无监督、开放式信息提取与远程监督。
  • 详细说明以特征为基础的方法,涵盖词汇、句法与语义特征。
  • 解释基于核的RE,使用序列、句法树、依存树和依存图核。
  • 描述关系实例的表征(序列、解析树、扩展的依存树)及其核。
  • 引入组合核,将多种子核结合以提升RE性能。
  • 讨论数据集(如ACE)及RE中的评估考量。

实验结果

研究问题

  • RQ1RE的主要范式有哪些,它们在输入、输出和监督水平上有何差异?
  • RQ2哪些特征和核表示能有效捕捉实体提及之间的关系信息?
  • RQ3不同的树及图核在RE任务中的比较如何?
  • RQ4哪些数据集与评估实践影响RE的发展,目前的挑战与未来方向是什么?

主要发现

  • 以特征为基础的方法依赖于经过精心设计的词汇、句法和语义特征来对关系实例进行分类。
  • 基于核的方法通过在序列、句法树和依存结构等表示之间测量相似性,减轻了显式特征工程的负担。
  • 探索了多种RE核(序列、句法树、依存树、依存路径),在ACE基准评估中,组合核通常表现最好。
  • ACE 2003/2004数据集是RE的核心基准,讨论了表示与上下文如何影响性能。
  • 类别不平衡和领域相关性被强调为有监督RE方法的挑战。
  • 开放信息提取和远程监督被认定为塑造可扩展RE的重要趋势。

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。