[论文解读] Measuring the Reliability of Hate Speech Annotations: The Case of the European Refugee Crisis
该论文评估仇恨言论标注的可靠程度,以及是否向标注者提供定义会提升可靠性,结果发现一致性非常低,并建议采用比二元是/否更为细致的标注。
Some users of social media are spreading racist, sexist, and otherwise hateful content. For the purpose of training a hate speech detection system, the reliability of the annotations is crucial, but there is no universally agreed-upon definition. We collected potentially hateful messages and asked two groups of internet users to determine whether they were hate speech or not, whether they should be banned or not and to rate their degree of offensiveness. One of the groups was shown a definition prior to completing the survey. We aimed to assess whether hate speech can be annotated reliably, and the extent to which existing definitions are in accordance with subjective ratings. Our results indicate that showing users a definition caused them to partially align their own opinion with the definition but did not improve reliability, which was very low overall. We conclude that the presence of hate speech should perhaps not be considered a binary yes-or-no decision, and raters need more detailed instructions for the annotation.
研究动机与目标
- 估计在难民危机相关的 Twitter 语料库中仇恨言论标注的跨标注者可靠性。
- 评估向标注者提供正式的仇恨言论定义对可靠性和标注决策的影响。
- 评估仇恨言论应作为二元标签还是在连续/冒犯性尺度上处理。
- 为构建更可靠的仇恨言论数据集和分类器提供指导。
提出的方法
- 从541条与难民危机相关的推文构建德语仇恨言论语料库。
- 在一个包含56名参与者的跨组设计中进行两次在线调查(有基于 Twitter 的定义和无定义)。
- 让每位参与者在6分量表上对20条推文进行仇恨言论、禁令和冒犯性评分。
- 计算 Krippendorff’s alpha 以评估跨组和跨问题的跨标注者可靠性。
- 比较有定义组和无定义组之间的回答,并分析每条推文的仇恨言论判断之间的相关性。
实验结果
研究问题
- RQ1提供正式定义是否会提高仇恨言论标注的可靠性?
- RQ2数据集中仇恨言论标注的跨标注者可靠性水平如何?
- RQ3二元仇恨言论判断与感知的冒犯性或禁令决策相比如何?
- RQ4仇恨言论标注应建模为回归问题还是二元分类?
主要发现
| Group | Participants | Age (mean) | Gender (% female) | Hate Speech (% yes) | Ban (% yes) | Offensive (mean) |
|---|---|---|---|---|---|---|
| Def. | 25 | 33.3 | 43.5 | 32.6 | 32.6 | 3.49 |
| No def. | 31 | 30.5 | 58.6 | 40.3 | 17.6 | 3.42 |
- 跨标注者可靠性非常低,Krippendorff’s alpha 的取值在0.18到0.29之间。
- 提供 Twitter 定义提高了与定义的一致性,但未提升总体可靠性。
- 给定定义的参与者比未给定义的参与者更可能建议禁止某条推文(在 Ban 决策上存在显著差异)。
- 两组在每条推文被视为仇恨言论的程度上存在强相关性(r = .895, p < .0001)。
- 作者建议每条推文使用多标签,并考虑回归式方法以捕捉仇恨程度,而非二元标签。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。