QUICK REVIEW

[论文解读] Automated Hate Speech Detection and the Problem of Offensive Language

Thomas Davidson, Dana Warmsley|arXiv (Cornell University)|Mar 11, 2017

Hate Speech and Cyberbullying Detection参考文献 14被引用 275

一句话总结

本文训练一个多类分类器，使用众包标注的推文数据集，将仇恨言论、冒犯性语言与两者都非区分开来，强调将仇恨言论与一般冒犯区分开来所面临的挑战以及情境的作用。

ABSTRACT

A key challenge for automatic hate-speech detection on social media is the separation of hate speech from other instances of offensive language. Lexical detection methods tend to have low precision because they classify all messages containing particular terms as hate speech and previous work using supervised learning has failed to distinguish between the two categories. We used a crowd-sourced hate speech lexicon to collect tweets containing hate speech keywords. We use crowd-sourcing to label a sample of these tweets into three categories: those containing hate speech, only offensive language, and those with neither. We train a multi-class classifier to distinguish between these different categories. Close analysis of the predictions and the errors shows when we can reliably separate hate speech from other offensive language and when this differentiation is more difficult. We find that racist and homophobic tweets are more likely to be classified as hate speech but that sexist tweets are generally classified as offensive. Tweets without explicit hate keywords are also more difficult to classify.

研究动机与目标

界定仇恨言论与冒犯性语言的区别，并说明区分两者的必要性。
创建一个标注数据集，区分仇恨言论、冒犯性语言和两者都非。
评估分类器性能并分析错误，以理解可分性。
识别影响检测准确性的语言与情境因素。

提出的方法

从 Hatebase.org 构建仇恨言论词典，并抽样包含词典术语的推文。
将标注众包为三类：仇恨言论、冒犯性语言，或两者都非。
提取 TF-IDF 的 unigram/bigram/trigram 特征；包括词性标注、情感、可读性和社交特征。
使用 5 折交叉验证训练分类器；比较逻辑回归、朴素贝叶斯、决策树、随机森林和线性 SVM。
使用 one-versus-rest 框架，以逻辑回归 (L2) 作为最终模型；在保留数据上进行评估。

实验结果

研究问题

RQ1多分类模型是否能可靠地区分仇恨言论、冒犯性语言和中性内容？
RQ2哪些语言或情境特征最能区分仇恨言论与冒犯性语言？
RQ3模型预测与人工标注的一致性如何，错误集中在何处？
RQ4是否明确的仇恨术语的出现会导致误分类，情境是否能缓解？
RQ5哪些类型的仇恨言论（如种族主义 vs. 性别歧视）更易检测或更难检测？

主要发现

最佳模型总体精确度为 0.91，召回率为 0.90，F1 为 0.90。
大约 40% 的真实仇恨言论推文被误分类，仇恨言论的精确度为 0.44，召回率为 0.61。
包含强烈侮辱词的仇恨言论比未包含明确术语的仇恨言论更易检测。
在忽略情境时，冒犯性语言常被误归为仇恨言论，而性别歧视术语往往被归为冒犯性语言而非仇恨。
模型将仅有 5% 的冒犯性语言和 2% 的无害推文标注为仇恨，表明类别之间存在一定分离。
基于词典的方法在仇恨言论上的精确度较低，凸显情境和多类别标注的价值。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。