QUICK REVIEW

[論文レビュー] Hate Lingo: A Target-based Linguistic Analysis of Hate Speech in Social Media

Mai ElSherief, Vivek Kulkarni|arXiv (Cornell University)|Apr 11, 2018

Hate Speech and Cyberbullying Detection被引用数 43

ひとこと要約

この論文はヘイトスピーチを標的によって区別する（Directed vs Generalized）、データセットを構築し、言語学的、心理言語学的、意味論的パターンを分析してヘイトスピーチの理解と検出を改善する。

ABSTRACT

While social media empowers freedom of expression and individual voices, it also enables anti-social behavior, online harassment, cyberbullying, and hate speech. In this paper, we deepen our understanding of online hate speech by focusing on a largely neglected but crucial aspect of hate speech -- its target: either "directed" towards a specific person or entity, or "generalized" towards a group of people sharing a common protected characteristic. We perform the first linguistic and psycholinguistic analysis of these two forms of hate speech and reveal the presence of interesting markers that distinguish these types of hate speech. Our analysis reveals that Directed hate speech, in addition to being more personal and directed, is more informal, angrier, and often explicitly attacks the target (via name calling) with fewer analytic words and more words suggesting authority and influence. Generalized hate speech, on the other hand, is dominated by religious hate, is characterized by the use of lethal words such as murder, exterminate, and kill; and quantity words such as million and many. Altogether, our work provides a data-driven analysis of the nuances of online-hate speech that enables not only a deepened understanding of hate speech and its social implications but also its detection.

研究の動機と目的

標的によるヘイトスピーチを識別し特徴付ける: Directed (toward an individual) vs Generalized (toward a protected group).
Robust analysisのためのDirectedとGeneralizedヘイトツイートの高品質データセットを構築する。
2つのヘイトスピーチ形態を区別する語彙的、意味論的、心理言語学的パターンを明らかにする。
ヘイトスピーチの検出、政策、社会的影響への含意を評価する。
ディスコース政策を通知しつつ、ヘイトスピーチ検出を改善するためのデータ駆動型の洞察を提供する。

提案手法

1%のTwitterストリーミングデータから複数のヘイトスピーチデータセットを構築する（Key phrase-basedおよびHashtag-based）、公開データセットおよびNHSMデータを追加。
Perspective APIの毒性評価とattack_on_commenterモデルを適用して高品質なヘイトスピーチ候補をフィルタリングし、Directed性を確保する（メンションと二人称代名詞）。
複数のアノテータを用いたCrowdflowerによる人間のアノテーションを使用して、Directed vs Generalizedヘイトスピーチをラベル付けし、アノテータ間信頼性を算出する。
SAGEを用いた語彙分析でカテゴリごとの顕著語を抽出し、T-NERによる固有表現認識を実施し、ドメイン特有のエンティティに焦点を当てる。
LIWC2015を用いた心理言語学的分析を実施し、analytical thinking、clout、authenticity、emotionなどの次元を測定する。
SemaForを用いてフレーム意味論を注釈付けし、Directed、Generalized、および Gen-1% ツイート間でフレームを比較する。
エンティティ分布、意味フレーム、およびLIWC指標を分析してDirected vs Generalizedヘイトスピーチを対比する。

実験結果

リサーチクエスチョン

RQ1What linguistic and psycholinguistic markers distinguish Directed hate speech from Generalized hate speech?
RQ2How do lexical, semantic-frame, and entity patterns differ between Directed and Generalized hate speech in social media?
RQ3What implications do the target-based distinctions have for hate speech detection and policy?

主な発見

Directed hate speech is more personal, informal, angrier, and shows higher clout than Generalized hate speech.
Generalized hate speech is dominated by religion-related terms and includes lethal language (kill, murder, exterminate) and quantity words (million, many).
Named entity analysis shows Directed hate contains more person entities, while Generalized hate features more religious and group-related entities.
LIWC analysis reveals Directed hate has lower analytical thinking and higher informal and social language, with greater anger; Generalized hate shows higher authenticity and emotion, and more religion-focused content.
Frame-semantic analysis shows Directed hate emphasizes intentional acts and hindering, while Generalized hate emphasizes Killing, Religion, and Quantity frames.
Salient words learned via SAGE indicate minimal overlap between categories, with distinct topic domains for each hate speech type.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。