QUICK REVIEW

[論文レビュー] Ex Machina: Personal Attacks Seen at Scale

Ellery Wulczyn, Nithum Thain|arXiv (Cornell University)|Oct 27, 2016

Hate Speech and Cyberbullying Detection参考文献 23被引用数 203

ひとこと要約

本論文は crowdsourcing と機械学習を組み合わせ、英語 Wikipedia における個人攻撃を大規模に検出し、約3人のクラウドワーカーのラベリングに相当する分類器を示しつつ、攻撃の普及とパターンを分析します。

ABSTRACT

The damage personal attacks cause to online discourse motivates many platforms to try to curb the phenomenon. However, understanding the prevalence and impact of personal attacks in online platforms at scale remains surprisingly difficult. The contribution of this paper is to develop and illustrate a method that combines crowdsourcing and machine learning to analyze personal attacks at scale. We show an evaluation method for a classifier in terms of the aggregated number of crowd-workers it can approximate. We apply our methodology to English Wikipedia, generating a corpus of over 100k high quality human-labeled comments and 63M machine-labeled ones from a classifier that is as good as the aggregate of 3 crowd-workers, as measured by the area under the ROC curve and Spearman correlation. Using this corpus of machine-labeled scores, our methodology allows us to explore some of the open questions about the nature of online personal attacks. This reveals that the majority of personal attacks on Wikipedia are not the result of a few malicious users, nor primarily the consequence of allowing anonymous contributions from unregistered users.

研究の動機と目的

Wikipedia のトークページにおける個人攻撃の普及と影響を大規模に定量化する。
個人攻撃の大規模コーパスをラベリングするための crowdsourcing と機械学習を組み合わせたスケーラブルな方法論を開発する。
機械ラベルデータがクラウド判断をどれだけ近似できるかを評価し、信頼性の高い分析のための閾値をキャリブレーションする。
サブグループ、寄稿者タイプ、モデレーション行動にわたる攻撃の長期分析を可能にする。

提案手法

Wikipedia のトークコメントに対して個人攻撃を識別するラベル付きコーパスを crowdsource する（コメントごとに複数の注釈者を使用）。
単語 n-gram または文字 n-gram の特徴を用いてバイナリテキスト分類器（LR および MLP）を訓練する。
OH（one-hot）多数決ラベルと ED（empirical distribution）ラベルという二つのラベリング方式を実験する。ED は注釈者が攻撃を予測した割合を表す。
予測とクラウド注釈ラベルを比較するために AUC と Spearman 相関を用いてモデルを評価する。
機械学習モデルを注釈者アンサンブルと比較する評価フレームワークを開発する（annotator ensemble baselining）。
最良モデルを Wikipedia コメント履歴全体の注釈付けに適用し、大規模分析を実施する。

実験結果

リサーチクエスチョン

RQ1Wikipedia のトークページにおける個人攻撃の普及はどの程度で、ユーザーの匿名性やアクティビティによってどう変化するか。
RQ2大規模攻撃検出において、クラウドラベルと機械生成ラベルのどちらが効果的か。
RQ3攻撃はモデレーターの行動やディスカッションのタイミングとどのように関連しているか。

主な発見

Model Type	N-Gram Type	Label Type	AUC	Spearman
LR	Word	OH	94.62	53.16
LR	Word	ED	95.55	65.20
LR	Char	OH	96.18	59.20
LR	Char	ED	96.24	66.68
MLP	Word	OH	95.25	56.11
MLP	Word	ED	96.15	66.33
MLP	Char	OH	95.90	58.77
MLP	Char	ED	96.59	68.17

character n-gram 特徴はモデル間で word n-gram 特徴を上回る。
ED ラベルで訓練されたモデルは OH ラベルで訓練されたモデルより AUC と Spearman 相関の両方で優れる。
最も性能の良い設定（character n-gram で ED ラベリング）は development データで AUC が約 96–96.6、Spearman が約 66–68。
annotator ensemble の規模が 3 で、最良の機械モデルと同等の性能を示し、分類器が約3人のクラウドワーカーを近似することを示す。
コメントの約 0.8% がランダムサンプルで攻撃としてラベリングされ、学習用データセット（blocked データセット）では約 11.7% の高い普及を示す。
匿名の編集者は攻撃的なコメントを生み出す可能性が6倍高いが、匿名アカウントは量の差により全攻撃の半未満を占める。
攻撃のモデレーターによる警告/ブロックの発生は全体の5分の1未満であり、攻撃の時系列的なクラスタリングは早期のモデレーター介入が有効である可能性を示唆する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。