[论文解读] Characterizing and Detecting Hateful Users on Twitter
本文通过以用户为中心、基于图的方法来刻画仇恨Twitter用户,并显示对转发图的半监督节点嵌入在检测仇恨和被禁用户方面优于仅基于内容的方法。
Most current approaches to characterize and detect hate speech focus on extit{content} posted in Online Social Networks. They face shortcomings to collect and annotate hateful speech due to the incompleteness and noisiness of OSN text and the subjectivity of hate speech. These limitations are often aided with constraints that oversimplify the problem, such as considering only tweets containing hate-related words. In this work we partially address these issues by shifting the focus towards extit{users}. We develop and employ a robust methodology to collect and annotate hateful users which does not depend directly on lexicon and where the users are annotated given their entire profile. This results in a sample of Twitter's retweet graph containing $100,386$ users, out of which $4,972$ were annotated. We also collect the users who were banned in the three months that followed the data collection. We show that hateful users differ from normal ones in terms of their activity patterns, word usage and as well as network structure. We obtain similar results comparing the neighbors of hateful vs. neighbors of normal users and also suspended users vs. active users, increasing the robustness of our analysis. We observe that hateful users are densely connected, and thus formulate the hate speech detection problem as a task of semi-supervised learning over a graph, exploiting the network of connections on Twitter. We find that a node embedding algorithm, which exploits the graph structure, outperforms content-based approaches for the detection of both hateful ($95\%$ AUC vs $88\%$ AUC) and suspended users ($93\%$ AUC vs $88\%$ AUC). Altogether, we present a user-centric view of hate speech, paving the way for better detection and understanding of this relevant and challenging issue.
研究动机与目标
- 开发一个在不依赖词汇密集采样的情况下收集并标注仇恨用户的过程。
- 在活动、词汇和网络结构方面刻画仇恨用户与正常用户的差异。
- 探索邻域和被禁信号作为仇恨内容的代理,以提升检测。
- 使用用户级特征评估基于图的半监督学习在仇恽检测中的效果。
提出的方法
- 构建一个基于随机游走的推特转发图样本,包含 100,386 个用户,每个用户最多 200 条推文。
- 识别使用仇恨相关词汇表中单词的用户作为种子,并在图上应用扩散过程以在图中分布信念。
- 从扩散得出的信念中按分层抽样 subsample 4,972 名用户用于众包注释为仇恨或非仇恨,使用分层抽样。
- 通过众包并结合整个用户档案的上下文,将用户标注为仇恨或正常。
- 分析仇恨、正常、邻居和被禁/活跃用户之间在活动、词汇和网络中心性方面的差异。
- 在使用用户特征和基于 GloVe 的文本特征的情况下,评估基于节点嵌入的检测方法(GraphSage)相较传统模型的表现。
实验结果
研究问题
- RQ1仇恨用户在活动、词汇和网络结构方面是否与正常用户不同?
- RQ2邻域和被禁信号能否作为 Twitter 中仇恨言论的代理?
- RQ3基于图的半监督学习方法是否能提升对仇恨与被禁用户的检测效果,相较于仅内容的方法?
- RQ4Twitter 的指南变更与仇恨/被禁账户的封禁之间有什么关系?
- RQ5转发网络结构是否对仇恨言论检测提供了除内容特征以外的信息?
主要发现
- 仇恨用户比正常用户活跃、每天关注更多、账号创建时间更近。
- 仇恨用户在转发网络中连接密集且更具中心性。
- 仇恨用户使用非平凡的词汇,包含更多与爱与男子气概相关的词汇,而与仇恨与愤怒相关的词相对较少。
- 对转发图的节点嵌入,结合用户特征与 GloVe,在检测仇恨方面达到最高性能(仇恨检测 AUC 高达 95.4,被禁检测 93.3)。
- 基于 GraphSage 的半监督嵌入在两种检测任务中均优于仅使用内容特征的 GradBoost 和 AdaBoost。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。