QUICK REVIEW

[论文解读] Offensive Language and Hate Speech Detection for Danish

Gudbjartur Ingi Sigurbergsson|arXiv (Cornell University)|Aug 13, 2019

Hate Speech and Cyberbullying Detection参考文献 23被引用 23

一句话总结

本文介绍了首个大规模、人工标注的丹麦语仇恨言论与攻击性语言检测数据集，数据源自Reddit和Facebook。研究提出多语言分类模型，在丹麦语中实现攻击性语言检测的宏平均F1得分为0.70，针对性攻击性语言检测的宏平均F1得分为0.73，表明共享语言资源与跨语言建模可有效提升低资源语言（如丹麦语）的性能。

ABSTRACT

The presence of offensive language on social media platforms and the implications this poses is becoming a major concern in modern society. Given the enormous amount of content created every day, automatic methods are required to detect and deal with this type of content. Until now, most of the research has focused on solving the problem for the English language, while the problem is multilingual. We construct a Danish dataset containing user-generated comments from extit{Reddit} and extit{Facebook}. It contains user generated comments from various social media platforms, and to our knowledge, it is the first of its kind. Our dataset is annotated to capture various types and target of offensive language. We develop four automatic classification systems, each designed to work for both the English and the Danish language. In the detection of offensive language in English, the best performing system achieves a macro averaged F1-score of $0.74$, and the best performing system for Danish achieves a macro averaged F1-score of $0.70$. In the detection of whether or not an offensive post is targeted, the best performing system for English achieves a macro averaged F1-score of $0.62$, while the best performing system for Danish achieves a macro averaged F1-score of $0.73$. Finally, in the detection of the target type in a targeted offensive post, the best performing system for English achieves a macro averaged F1-score of $0.56$, and the best performing system for Danish achieves a macro averaged F1-score of $0.63$. Our work for both the English and the Danish language captures the type and targets of offensive language, and present automatic methods for detecting different kinds of offensive language such as hate speech and cyberbullying.

研究动机与目标

解决丹麦语攻击性语言与仇恨言论检测中缺乏标注数据集的问题。
开发在英语和丹麦语上均表现良好的多语言分类系统。
分析攻击性语言检测中的语言学挑战，如伪装与语境依赖性。
在子任务层面评估模型性能：攻击性语言检测、针对性检测与目标类型分类。
在CC-BY许可下发布高质量数据与模型，供研究与共享任务使用。

提出的方法

从Reddit和Facebook的用户生成评论中构建丹麦语数据集，依据标准化指南对攻击性语言、针对性与目标类型进行标注。
采用多语言BERT进行迁移学习，并在三个子任务上对英语和丹麦语模型进行微调。
使用逻辑回归与附加特征的辅助Fast-BiLSTM模型，与基于BERT的模型进行性能对比。
对误分类样本进行TF-IDF与n-gram分析，识别持续性失败模式，如伪装与关键词过度依赖。
对误分类样本进行人工分析，诊断模型弱点，尤其关注语境与伪装术语问题。
使用所有子任务的宏平均F1得分评估模型性能，通过精确率、召回率与混淆矩阵评估类别不平衡与数据质量。

实验结果

研究问题

RQ1多语言模型在丹麦语中检测攻击性语言的性能与英语相比如何？
RQ2哪些语言模式（如伪装或语境依赖的粗俗用语）导致攻击性语言检测中持续误分类？
RQ3数据质量与类别不平衡在丹麦语攻击性语言检测中在多大程度上影响模型性能？
RQ4不同模型架构（包括BERT与基于辅助特征的模型）在检测针对性攻击性语言与目标类型方面的表现如何比较？
RQ5共享语言资源与迁移学习在低资源语言（如丹麦语）上能否提升性能？

主要发现

丹麦语攻击性语言检测中表现最佳的模型实现了0.70的宏平均F1得分，优于该任务上的英语模型（0.74）。
在针对性攻击性语言检测中，丹麦语模型实现了0.73的宏平均F1得分，显著优于英语模型（0.62）。
在目标类型分类中，丹麦语模型实现了0.63的宏平均F1得分，而英语模型为0.56，表明丹麦语在细粒度子任务上具有更好的泛化能力。
分类器在处理伪装的攻击性术语（如'barrrysoetorobullshit'与'Hahhaaha lær det biiiiiaaaatch'）时表现不佳，常将其误判为非攻击性内容。
模型对关键词存在（如'she'、'svensken'、'pikfjæs'）表现出强烈倾向，而非语境含义，导致攻击性语言检测中出现大量假阳性。
数据质量问题明显，部分明显具有针对性的侮辱性表达（如'HillaryForPrison'）在测试集中被错误标注为无针对性，影响评估可靠性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。