QUICK REVIEW

[論文レビュー] A Survey on Deep Learning for Named Entity Recognition

Jing Li, Aixin Sun|arXiv (Cornell University)|Dec 22, 2018

Topic Modeling参考文献 194被引用数 100

ひとこと要約

論文はNERの深層学習手法を概観し、アプローチを入力表現、文脈エンコーダ、タグデコーダで整理し、データセット、ツール、評価、課題、将来の方向性について論じている。

ABSTRACT

Named entity recognition (NER) is the task to identify mentions of rigid designators from text belonging to predefined semantic types such as person, location, organization etc. NER always serves as the foundation for many natural language applications such as question answering, text summarization, and machine translation. Early NER systems got a huge success in achieving good performance with the cost of human engineering in designing domain-specific features and rules. In recent years, deep learning, empowered by continuous real-valued vector representations and semantic composition through nonlinear processing, has been employed in NER systems, yielding stat-of-the-art performance. In this paper, we provide a comprehensive review on existing deep learning techniques for NER. We first introduce NER resources, including tagged NER corpora and off-the-shelf NER tools. Then, we systematically categorize existing works based on a taxonomy along three axes: distributed representations for input, context encoder, and tag decoder. Next, we survey the most representative methods for recent applied techniques of deep learning in new NER problem settings and applications. Finally, we present readers with the challenges faced by NER systems and outline future directions in this area.

研究の動機と目的

NERを動機づけ定義し、NLPアプリケーションにおける重要性を説明する。
利用可能な英語NERデータセットと市販/既製ツールを要約する。
入力表現、文脈エンコーダ、タグデコーダによるDLベースNERの分類法を提案する。
代表的な深層学習手法と、それらのNER設定への適用性を概観する。
DLベースNERの課題を論じ、今後の研究方向を概説する。

提案手法

DLベースNERの三軸タキソノミーを導入する：分散化された入力表現、文脈エンコーダ（CNN/RNN）、タグデコーダ。
NERモデルで使用される単語レベル、文字レベル、ハイブリッド表現をレビューする。
文脈エンコーダアーキテクチャ（例：CNN、RNN、LSTM、BiLSTM）とタグデコーディングの相互作用を議論する。
データセット資源（例：CoNLL03、OntoNotes）と広く用いられるNERツール（学術界と産業界）を要約する。
評価指標（exact-matchとrelaxed-match）と従来の評価スキームおよびそれらのトレードオフを説明する。

実験結果

リサーチクエスチョン

RQ1NERで用いられる支配的な深層学習アーキテクチャと表現は何で、それらは性能にどのように影響するか？
RQ2英語NER研究とベンチマークに最も影響力のある資源はどれか（データセットとツール）？
RQ3NERシステムをどのように評価すべきか、異なる評価スキームの長所と短所は？
RQ4実践と研究におけるDLベースNERの主な課題と未解決の方向性は？

主な発見

自動表現学習とエンドツーエンド学習により、深層学習はNERで支配的になった。
単語レベル、文字レベル、辞書データと文脈化埋め込みを含むハイブリッド表現が深層学習ベースNERの中核です。
文脈エンコーダ（CNN、RNN、LSTM、Transformer）とタグデコーダがNER性能を共同で決定します。文脈モデリングが重要である。
ニュース、ウェブ、バイオメディカル、ユーザー生成テキストなどを含む多様な英語NERデータセットと市販/市販ツールが利用可能。
評価フレームワークにはexact-match（境界とタイプ）とrelaxed-matchが含まれ、解釈性と比較可能性のトレードオフを伴う。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。