[Paper Review] Ex Machina: Personal Attacks Seen at Scale
The paper combines crowdsourcing and machine learning to detect personal attacks at scale on English Wikipedia, and shows a classifier that matches the labeling of about three crowd workers while analyzing attack prevalence and patterns.
The damage personal attacks cause to online discourse motivates many platforms to try to curb the phenomenon. However, understanding the prevalence and impact of personal attacks in online platforms at scale remains surprisingly difficult. The contribution of this paper is to develop and illustrate a method that combines crowdsourcing and machine learning to analyze personal attacks at scale. We show an evaluation method for a classifier in terms of the aggregated number of crowd-workers it can approximate. We apply our methodology to English Wikipedia, generating a corpus of over 100k high quality human-labeled comments and 63M machine-labeled ones from a classifier that is as good as the aggregate of 3 crowd-workers, as measured by the area under the ROC curve and Spearman correlation. Using this corpus of machine-labeled scores, our methodology allows us to explore some of the open questions about the nature of online personal attacks. This reveals that the majority of personal attacks on Wikipedia are not the result of a few malicious users, nor primarily the consequence of allowing anonymous contributions from unregistered users.
Motivation & Objective
- Quantify the prevalence and impact of personal attacks on Wikipedia talk pages at scale.
- Develop a scalable methodology combining crowdsourcing and machine learning to label large corpora for personal attacks.
- Evaluate how well machine-labeled data approximate crowd judgments and calibrate a threshold for reliable analysis.
- Enable longitudinal analysis of attacks across subgroups, contributor types, and moderation actions.
Proposed method
- Crowdsource a labeled corpus of Wikipedia talk comments to identify personal attacks, using multiple annotators per comment.
- Train binary text classifiers (LR and MLP) with word or character n-gram features.
- Experiment with two labeling schemes: one-hot (OH) majority-labels and empirical distribution (ED) labels representing the fraction of annotators predicting an attack.
- Evaluate models using AUC and Spearman correlation to compare predictions against crowd-annotated labels.
- Develop an evaluation framework that compares a machine-learning model to an annotator ensemble (annotator ensemble baselining).
- Apply the best model to annotate the entire Wikipedia comment history and perform large-scale analyses.
Experimental results
Research questions
- RQ1What is the prevalence of personal attacks on Wikipedia talk pages, and how does it vary by user anonymity and activity?
- RQ2How effective are crowd-labeled annotations versus machine-generated labels for scalable attack detection?
- RQ3How do attacks relate to moderator actions and timing within discussions?
Key findings
- Character n-gram features outperform word n-gram features across models.
- Models trained on empirical distribution (ED) labels outperform those trained on one-hot (OH) labels in both AUC and Spearman correlation.
- The best-performing configurations (character n-grams with ED labeling) achieve AUC around 96–96.6 and Spearman around 66–68 on development data.
- An annotator ensemble of size 3 has performance comparable to the best machine model, meaning the classifier approximates three crowd-workers.
- About 0.8% of comments are labeled as attacks in random samples, with a higher prevalence in the “blocked” dataset used for training (≈11.7%).
- Anonymous editors are six times more likely to produce attacking comments, yet anonymous accounts contribute fewer than half of all attacks due to volume differences.
- Less than a fifth of attacks trigger moderator actions (warnings/blocks), and attack clustering over time suggests early moderator intervention could be impactful.
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.