QUICK REVIEW

[论文解读] Thou shalt not hate: Countering Online Hate Speech

Binny Mathew, Hardik Tharad|arXiv (Cornell University)|Jan 1, 2018

Hate Speech and Cyberbullying Detection参考文献 21被引用 27

一句话总结

本文引入了首个大规模、人工标注的YouTube反仇恨言论评论数据集，使首次对反仇恨言论进行严谨的语言学分析成为可能。研究提出了机器学习模型，检测反仇恨言论的F1得分为0.71，多标签分类不同类型的反仇恨言论的F1得分为0.60，揭示了反仇恨言论的动态特征、有效性以及与仇恨言论在心理语言学上的差异。

ABSTRACT

Hate content in social media is ever-increasing. While Facebook, Twitter, Google have attempted to take several steps to tackle the hateful content, they have mostly been unsuccessful. Counterspeech is seen as an effective way of tackling the online hate without any harm to the freedom of speech. Thus, an alternative strategy for these platforms could be to promote counterspeech as a defense against hate content. However, in order to have a successful promotion of such counterspeech, one has to have a deep understanding of its dynamics in the online world. Lack of carefully curated data largely inhibits such understanding. In this paper, we create and release the first ever dataset for counterspeech using comments from YouTube. The data contains 13,924 manually annotated comments where the labels indicate whether a comment is a counterspeech or not. This data allows us to perform a rigorous measurement study characterizing the linguistic structure of counterspeech for the first time. This analysis results in various interesting insights such as: the counterspeech comments receive much more likes as compared to the non-counterspeech comments, for certain communities majority of the non-counterspeech comments tend to be hate speech, the different types of counterspeech are not all equally effective and the language choice of users posting counterspeech is largely different from those posting non-counterspeech as revealed by a detailed psycholinguistic analysis. Finally, we build a set of machine learning models that are able to automatically detect counterspeech in YouTube videos with an F1-score of 0.71. We also build multilabel models that can detect different types of counterspeech in a comment with an F1-score of 0.60.

研究动机与目标

为应对YouTube、Facebook和Twitter等社交媒体平台日益严重的在线仇恨言论问题。
解决目前缺乏用于研究反仇恨言论的精选数据的问题，反仇恨言论是一种不压制言论自由的非审查策略。
创建并发布首个大规模、人工标注的YouTube评论数据集，标注了反仇恨言论的存在与否。
对反仇恨言论进行全面的语言学与心理语言学分析，以理解其结构与行为动态特征。
开发能够自动检测在线评论中反仇恨言论及其类别的机器学习模型。

提出的方法

精心筛选并人工标注了13,924条YouTube评论，创建了首个公开的反仇恨言论数据集。
应用语言学与心理语言学分析，比较反仇恨言论与非反仇恨言论评论中的语言使用差异。
利用源自评论的词汇、句法和情感特征的特征，训练监督式机器学习模型。
开发了二元分类器（反仇恨言论 vs. 非反仇恨言论）和多标签分类器（用于不同类型的反仇恨言论）。
使用标准NLP指标（包括F1得分）在标注数据集上评估模型性能。
通过统计分析比较反仇恨言论与非反仇恨言论评论在互动量（点赞数）和内容模式上的差异。

实验结果

研究问题

RQ1哪些语言学与心理语言学特征能够将YouTube上的反仇恨言论评论与非反仇恨言论评论区分开？
RQ2反仇恨言论评论的互动量（如点赞数）与非反仇恨言论评论相比如何？
RQ3在在线社区中，哪些类型的反仇恨言论最有效地遏制仇恨言论？
RQ4发布反仇恨言论的用户与发布仇恨言论的用户在语言模式上存在哪些差异？
RQ5机器学习模型在现实世界YouTube评论区中，多大程度上能够准确检测反仇恨言论及其子类型？

主要发现

反仇恨言论评论获得的点赞数显著多于非反仇恨言论评论，表明用户参与度更高，且被认为更具价值。
在某些在线社区中，多数非反仇恨言论评论被归类为仇恨言论，凸显了有毒言论的普遍性。
不同类型的反仇恨言论效果不一，表明回应风格的策略性变化可能提升影响力。
心理语言学分析揭示了反仇恨言论与非反仇恨言论发布者在语言使用模式上的显著差异，尤其体现在情感基调和词汇复杂度方面。
二元机器学习模型在检测反仇恨言论上的F1得分为0.71，表明在新数据集上表现强劲。
用于分类反仇恨言论类别的多标签模型F1得分为0.60，表明对细微回应类别的检测虽具可行性但具挑战性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。