QUICK REVIEW

[论文解读] Deep Learning for Hate Speech Detection: A Comparative Study

Jitendra Singh Malik, Qiao, Hezhe|arXiv (Cornell University)|Feb 19, 2022

Hate Speech and Cyberbullying Detection被引用 37

一句话总结

本文对14个浅层/深度仇恨言论检测器在三个公开基准上的大规模实证比较，评估其有效性、效率、预训练影响及跨领域泛化。

ABSTRACT

Automated hate speech detection is an important tool in combating the spread of hate speech, particularly in social media. Numerous methods have been developed for the task, including a recent proliferation of deep-learning based approaches. A variety of datasets have also been developed, exemplifying various manifestations of the hate-speech detection problem. We present here a large-scale empirical comparison of deep and shallow hate-speech detection methods, mediated through the three most commonly used datasets. Our goal is to illuminate progress in the area, and identify strengths and weaknesses in the current state-of-the-art. We particularly focus our analysis on measures of practical performance, including detection accuracy, computational efficiency, capability in using pre-trained models, and domain generalization. In doing so we aim to provide guidance as to the use of hate-speech detection in practice, quantify the state-of-the-art, and identify future research directions. Code and dataset are available at https://github.com/jmjmalik22/Hate-Speech-Detection.

研究动机与目标

评估不同仇恨言论检测模型在多样数据集上的表现。
确定在准确性与效率之间提供有利权衡的模型。
评估预训练方法对检测器性能的影响。
检查跨领域泛化，以了解领域转换对仇恨言论检测器的影响。

提出的方法

将检测器分为浅层传统、词向量深度方法和基于 transformer 的深度方法。
使用 TF-IDF、GloVe 嵌入和 transformer-based 嵌入（BERT、ALBERT、ELECTRA 等）来评估 14 个检测器。
将嵌入与分类器（SVM、XGB、MLP、CNN、Bi-LSTM）结合，并报告 macro 和 weighted F1 分数。
使用三个数据集（Davidson、Founta、TSA）且类别不平衡，报告逐类别指标。
通过 epoch 时间分析计算效率，并识别实际模型-准确性权衡。

实验结果

研究问题

RQ1在多样的基准数据集上，流行的仇恨言论检测器有多有效？
RQ2是否存在在有效性和效率方面通常优于其他模型的模型？
RQ3预训练方法如何影响仇恨言论检测性能？
RQ4模型在具有不同仇恨言论定义和分布的领域之间的泛化能力如何？

主要发现

基于 Transformer 的嵌入（BERT、ALBERT、ELECTRA）在所有数据集上始终提供最佳的 macro 和 weighted F1 分数。
基于 TF-IDF 的 XGBoost 和基于 TF-IDF 的 MLP 可能具有竞争力，尤其在数据量较大或较平衡的部分上，但 transformers 在性能上通常占优。
Transformer 模型的训练时间较长；在 transformers 中，Small BERT 是最有效的。
预训练嵌入（尤其是基于 transformer 的）在所有三个数据集上都优于未预训练或 TF-IDF 基线。
跨域评估显示从一个数据集转移到另一个数据集时泛化能力下降，但 ELECTRA-CNN 通常保持强劲的性能，作为鲁棒的选择。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。