QUICK REVIEW

[論文レビュー] Relation Extraction : A Survey

Sachin Pawar, Girish Keshav Palshikar|arXiv (Cornell University)|Dec 14, 2017

Topic Modeling参考文献 115被引用数 76

ひとこと要約

RE技術の包括的な調査で、supervised、semi-supervised、unsupervised approaches、Open Information Extraction、distant supervision を網羅し、featureとkernel法およびACEのような一般的なREデータセットを強調します。

ABSTRACT

With the advent of the Internet, large amount of digital text is generated everyday in the form of news articles, research publications, blogs, question answering forums and social media. It is important to develop techniques for extracting information automatically from these documents, as lot of important information is hidden within them. This extracted information can be used to improve access and management of knowledge hidden in large text corpora. Several applications such as Question Answering, Information Retrieval would benefit from this information. Entities like persons and organizations, form the most basic unit of the information. Occurrences of entities in a sentence are often linked through well-defined relations; e.g., occurrences of person and organization in a sentence may be linked through relations such as employed at. The task of Relation Extraction (RE) is to identify such relations automatically. In this paper, we survey several important supervised, semi-supervised and unsupervised RE techniques. We also cover the paradigms of Open Information Extraction (OIE) and Distant Supervision. Finally, we describe some of the recent trends in the RE techniques and possible future research directions. This survey would be useful for three kinds of readers - i) Newcomers in the field who want to quickly learn about RE; ii) Researchers who want to know how the various RE techniques evolved over time and what are possible future research directions and iii) Practitioners who just need to know which RE technique works best in various settings.

研究の動機と目的

Relation extractionタスクと課題の構造化された概要を提供する。
supervised、semi-supervised、およびunsupervised REアプローチを要約する。
Open Information Extractionとdistant supervisionのパラダイムを議論する。
REのためのカーネルおよび特徴ベースの手法とその評価をレビューする。
REにおけるデータセットと将来の方向性を強調する。

提案手法

RE技術をsupervised、semi-supervised、unsupervised、Open IE、distant supervisionに分類する。
語彙的、統語的、意味的特徴を用いた特徴ベースの手法の詳細を説明する。
sequence、syntactic-tree、dependency-tree、dependency-graphカーネルを用いたkernel-based REを説明する。
関係インスタンスの表現（sequence、parse-tree、augmented dependency trees）とそのカーネルを説明する。
複数のサブカーネルを組み合わせて性能を向上させる複合カーネルを紹介する。
REにおけるデータセット（例：ACE）と評価上の考慮事項を議論する。

実験結果

リサーチクエスチョン

RQ1主要なREパラダイムは何で、それらは入力、出力、監視レベルにおいてどう異なるのか？
RQ2関係情報をエンティティ言及間で効果的に捉える特徴およびカーネル表現は何か？
RQ3異なる木構造・グラフベースのカーネルはREタスクでどのように比較されるのか？
RQ4どのデータセットと評価手法がREの進展を形作り、現在および将来の課題は何か？

主な発見

特徴ベースの手法は、関係インスタンスを分類するために carefully engineered な語彙的・統語的・意味的特徴に依存する。
カーネルベースのアプローチは、Sequences、統語木、依存関係構造などの表現に対する類似度を測ることで明示的な特徴設計を緩和する。
複数のREカーネル（sequence、syntactic tree、dependency tree、dependency path）が検討され、複合カーネルがACEベースの評価で最も良い性能を示すことが多い。
ACE 2003/2004データセットはREの中心的ベンチマークであり、表現と文脈が性能に与える影響について議論されている。
クラス不均衡とドメイン依存性がスーパバイズドRE手法の課題として強調されている。
Open Information Extractionとdistant supervisionは、スケーラブルなREを形作る重要な傾向として特定されている。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。