QUICK REVIEW

[論文レビュー] Interpretable Multi-Modal Hate Speech Detection

Prashanth Vijayaraghavan, Hugo Larochelle|arXiv (Cornell University)|Mar 2, 2021

Hate Speech and Cyberbullying Detection参考文献 28被引用数 24

ひとこと要約

この論文は、テキストの意味論と社会文化的文脈およびソーシャルグラフの特徴を組み合わせる深層多モーダルモデルを提案し、テキストのみのベースラインよりも優れた性能を示し、解釈可能な洞察を提供します。

ABSTRACT

With growing role of social media in shaping public opinions and beliefs across the world, there has been an increased attention to identify and counter the problem of hate speech on social media. Hate speech on online spaces has serious manifestations, including social polarization and hate crimes. While prior works have proposed automated techniques to detect hate speech online, these techniques primarily fail to look beyond the textual content. Moreover, few attempts have been made to focus on the aspects of interpretability of such models given the social and legal implications of incorrect predictions. In this work, we propose a deep neural multi-modal model that can: (a) detect hate speech by effectively capturing the semantics of the text along with socio-cultural context in which a particular hate expression is made, and (b) provide interpretable insights into decisions of our model. By performing a thorough evaluation of different modeling techniques, we demonstrate that our model is able to outperform the existing state-of-the-art hate speech classification approaches. Finally, we show the importance of social and cultural context features towards unearthing clusters associated with different categories of hate.

研究の動機と目的

テキストを超える社会文化的文脈を用いてヘイトスピーチを検出する必要性を動機づける。
テキスト、人口統計、およびソーシャルグラフ特徴を統合する多モーダルニューラルモデルを開発する。
社会的・文化的文脈がヘイトスピーチ検出の性能を向上させることを示す。
アテンション機構を通じてモデルの決定に対する解釈可能な洞察を提供する。
学習された埋め込みを用いてヘイト表現をカテゴリにクラスタリングできる能力をモデルに示す。

提案手法

ツイートと著者属性を含む多モーダルヘイトスピーチデータセット D(H) を定義する。
文字強化語表現と自己アテンションを用いてテキスト特徴を生成する。
事前学習済みの人口統計分類器を用いて著者の人口統計表現から社会文化的文脈を抽出する。
ヘイトコミュニティのフォローグラフ G^h から社会的文脈特徴を構築し、低次元ベクトルへ写像する。
テキスト特徴と社会文化的特徴を後期融合の自己アテンション機構で融合し、分類の最終表現を生成する。
カテゴリカルクロスエントロピーでモデルを学習し、従来・深層学習ベースラインと比較して評価する。

実験結果

リサーチクエスチョン

RQ1社会文化的および社会的文脈特徴を取り入れることで、テキストのみのモデルよりヘイトスピーチ検出の性能が向上するか。
RQ2人口統計およびソーシャルグラフ特徴がヘイトスピーチ検出とヘイトカテゴリのクラスタリングにどう寄与するか。
RQ3アテンション重みによってモデルの予測に対する解釈可能な洞察を提供できるか。
RQ4多モーダル融合の相対的な利得は、テキストのみおよび従来モデルと比較してどの程度か。

主な発見

提案された多モーダルモデルは、F1(hate)およびF1(overall)の点で従来型およびテキストのみの深層学習ベースラインを上回る。
Text+SCモデル（テキストと社会文化的特徴を組み合わせたモデル）は、テキストのみの counterparts より高い性能を達成する（例：BiGRU+Char+Attn+FF: F1 Hate 0.784, F1 Overall 0.90）。
社会的・文化的文脈の組み込みは、テキストのみモデルよりも性能を大幅に向上させる（例：BiGRU+Char+Attn: F1 Hate 0.744, F1 Overall 0.864）。
モデルはヘイトジェスチャ埋め込みを学習し、Anti-Islam、Anti-Black、Anti-Immigrant、General Hate、Anti-Semitic のカテゴリにクラスタリングできることを、上位アテンション語からの定性的証拠と共に示す。
アテンションベースの解釈性は摂動ベースの手法と整合し、予測におけるコードワードや文脈的手掛かりを強調する。
クラスタ・純度スコアは、社会文化的文脈を使用した場合（Text+SC: 0.76）に ground-truth ヘイトカテゴリとより良く一致することを示す（Text Only: 0.52）。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。