QUICK REVIEW

[論文レビュー] Sentiment Analysis of German Twitter

Wladimir Sidorenko|arXiv (Cornell University)|Jan 1, 2019

Sentiment Analysis and Opinion Mining参考文献 155被引用数 4

ひとこと要約

本稿では、大規模で手作業でアノテートされたドイツ語ツイッター感情分析コーパスを紹介し、ドイツ語ソーシャルメディアにおける感情分析のための新規手法を提案する。感情用語集生成の改善、拡張されたCRFを用いた細分化された意見抽出、語彙情報と注意メカニズムを組み合わせたメッセージレベル分類、および潜在的周辺化CRFと再帰的ディリクレ過程を用いたディコースル意識型分析により、ドイツ語ツイッター感情分析タスクで最先端の性能を達成している。

ABSTRACT

The immense popularity of online communication services in the last decade has not only upended our lives (with news spreading like wildfire on the Web, presidents announcing their decisions on Twitter, and the outcome of political elections being determined on Facebook) but also dramatically increased the amount of data exchanged on these platforms. Therefore, if we wish to understand the needs of modern society better and want to protect it from new threats, we urgently need more robust, higher-quality natural language processing (NLP) applications that can recognize such necessities and menaces automatically, by analyzing uncensored texts. Unfortunately, most NLP programs today have been created for standard language, as we know it from newspapers, or, in the best case, adapted to the specifics of English social media. This thesis reduces the existing deficit by entering the new frontier of German online communication and addressing one of its most prolific forms—users’ conversations on Twitter. In particular, it explores the ways and means by how people express their opinions on this service, examines current approaches to automatic mining of these feelings, and proposes novel methods, which outperform state-of-the-art techniques. For this purpose, I introduce a new corpus of German tweets that have been manually annotated with sentiments, their targets and holders, as well as lexical polarity items and their contextual modifiers. Using these data, I explore four major areas of sentiment research: (i) generation of sentiment lexicons, (ii) fine-grained opinion mining, (iii) message-level polarity classification, and (iv) discourse-aware sentiment analysis. In the first task, I compare three popular groups of lexicon generation methods: dictionary-, corpus-, and word-embedding–based ones, finding that dictionary-based systems generally yield better polarity lists than the last two groups. Apart from this, I propose a linear projection algorithm, whose results surpass many existing automatically-generated lexicons. Afterwords, in the second task, I examine two common approaches to automatic prediction of sentiment spans, their sources, and targets: conditional random fields (CRFs) and recurrent neural networks, obtaining higher scores with the former model and improving these results even further by redefining the structure of CRF graphs. When dealing with message-level polarity classification, I juxtapose three major sentiment paradigms: lexicon-, machine-learning–, and deep-learning–based systems, and try to unite the first and last of these method groups by introducing a bidirectional neural network with lexicon-based attention. Finally, in order to make the new classifier aware of microblogs' discourse structure, I let it separately analyze the elementary discourse units of each tweet and infer the overall polarity of a message from the scores of its EDUs with the help of two new approaches: latent-marginalized CRFs and Recursive Dirichlet Process.

研究の動機と目的

感情分析のための高品質で手作業でアノテートされたドイツ語ソーシャルメディアデータの不足に対処すること。
ドイツ語ツイッターにおける感情分析のための新規手法を開発・評価すること。主な焦点は用語集生成、意見抽出、メッセージレベル分類、ディコースル意識型分析である。
低リソースで非公式な言語環境におけるドイツ語NLPシステムの学習と評価のための包括的リソースを構築すること。
文脈的修飾子や言語構造、ディコースル意識型モデリングを統合することで、感情分析タスクの性能を向上させること。

提案手法

感情ラベル、ターゲット、ホルダー、語彙的極性項目を含む、手作業でアノテートされた新しいドイツ語ツイッターコーパスを提案する。
辞書ベース、コーパスベース、ワード埋め込みベースの用語集生成手法を比較し、辞書ベース手法を支持する一方で、線形射影アルゴリズムを導入する。
再構築されたグラフトポロジーを用いた条件付きランダムフィールド（CRF）を採用し、細分化された意見抽出の性能を向上させる。
語彙ベースの注意メカニズムを組み込んだ双方向ニューラルネットワークを提案し、メッセージレベルの感情分類を実現する。
潜在的周辺化CRFと再帰的ディリクレ過程を用いてディコースル構造をモデル化し、基本的ディコースルユニットから全体のツイート極性を推論する。
信念伝播とビタビデコードを用いて、αおよびβスコアの修正計算を伴う線形チェーン型、準マルコフ型、木構造型CRFにおける推論を実行する。

実験結果

リサーチクエスチョン

RQ1辞書ベース、コーパスベース、ワード埋め込みベースのいずれの手法がドイツ語ツイッターにおける信頼性の高い感情用語集を生成するか？
RQ2再構築されたCRFグラフは、ドイツ語ツイッターにおける細分化された意見抽出の性能を向上させることができるか？
RQ3語彙ベースの注意メカニズムを双方向ニューラルネットワークに統合することで、メッセージレベルの感情分類にどのような影響を与えるか？
RQ4ディコースル構造をモデル化することで、マイクロブログにおける感情分類の性能はどの程度向上するか？
RQ5潜在的周辺化CRFと再帰的ディリクレ過程は、ツイッターにおけるディコースル意識型感情推論を効果的にモデル化できるか？

主な発見

極性リストの質の観点から、辞書ベースの用語集生成手法がコーパスベースおよびワード埋め込みベースの手法を上回る。
提案された線形射影アルゴリズムは、多くの既存の自動生成用語集を上回る性能を示す。
再構築されたグラフを用いたCRFベースのモデルは、標準的なCRFやRNNよりも、細分化された意見抽出タスクで高いスコアを達成する。
語彙ベースの注意メカニズムを組み込んだ双方向ニューラルネットワークは、語彙的特徴と深層学習の長所を統合することで、メッセージレベルの感情分類性能を向上させる。
潜在的周辺化CRFと再帰的ディリクレ過程は、基本的ディコースルユニットとその階層的関係をモデル化することで、ディコースル意識型感情分析を強化する。
提案手法は、新規のドイツ語ツイッターコーパス上での4つの感情分析タスクすべてで最先端の結果を達成している。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。