Skip to main content
QUICK REVIEW

[论文解读] To Act or React: Investigating Proactive Strategies For Online Community Moderation

Hussam Habib, Maaz Bin Musa|arXiv (Cornell University)|Jun 27, 2019
Hate Speech and Cyberbullying Detection被引用 32
一句话总结

本文提出了一种针对Reddit的主动监管方法,通过可解释的机器学习预测哪些子版块可能演变为仇恨或危险社区。研究发现,与已知问题子版块的结构连通性是最重要的预测因子;尽管加入此类社区会损害用户文明行为,但当前的封禁和隔离措施未能纠正这一行为,凸显了更有效干预手段的必要性。

ABSTRACT

Reddit administrators have generally struggled to prevent or contain such discourse for several reasons including: (1) the inability for a handful of human administrators to track and react to millions of posts and comments per day and (2) fear of backlash as a consequence of administrative decisions to ban or quarantine hateful communities. Consequently, as shown in our background research, administrative actions (community bans and quarantines) are often taken in reaction to media pressure following offensive discourse within a community spilling into the real world with serious consequences. In this paper, we investigate the feasibility of proactive moderation on Reddit -- i.e., proactively identifying communities at risk of committing offenses that previously resulted in bans for other communities. Proactive moderation strategies show promise for two reasons: (1) they have potential to narrow down the communities that administrators need to monitor for hateful content and (2) they give administrators a scientific rationale to back their administrative decisions and interventions. Our work shows that communities are constantly evolving in their user base and topics of discourse and that evolution into hateful or dangerous (i.e., considered bannable by Reddit administrators) communities can often be predicted months ahead of time. This makes proactive moderation feasible. Further, we leverage explainable machine learning to help identify the strongest predictors of evolution into dangerous communities. This provides administrators with insights into the characteristics of communities at risk becoming dangerous or hateful. Finally, we investigate, at scale, the impact of participation in hateful and dangerous subreddits and the effectiveness of community bans and quarantines on the behavior of members of these communities.

研究动机与目标

  • 调查子版块是否会随时间演变,挑战用户群体和主题稳定的假设。
  • 确定能否提前预测某些子版块可能演变为仇恨或危险社区。
  • 评估加入仇恨子版块以及后续封禁/隔离措施对Reddit用户行为的影响。
  • 为管理员提供数据驱动、可解释的工具,以合理化并改进监管决策。
  • 识别可预测子版块向有害内容演化的结构与行为特征。

提出的方法

  • 通过相似性度量量化用户群体和主题分布随时间的变化,追踪子版块的演变。
  • 开发可解释的机器学习模型,基于社区、用户、版主和结构特征,预测子版块是否会遭到封禁或隔离。
  • 将基于网络的特征,尤其是子版块间的用户互动,作为模型中的关键预测因子。
  • 对用户在加入有害子版块前后以及社区级干预(封禁/隔离)后的行为进行大规模分析。
  • 采用对照组方法,将加入有害子版块的用户与未加入的相似用户进行比较,以隔离因果效应。
  • 运用技术手段识别并排序预测子版块演化的最具影响力的特征,提升管理员的可解释性。

实验结果

研究问题

  • RQ1子版块是否在长期内保持用户群体和主题的稳定性,还是持续演变?
  • RQ2我们能否提前预测某个子版块是否会演变为仇恨或危险社区?
  • RQ3加入仇恨子版块如何影响用户在其他社区中的文明程度?
  • RQ4Reddit当前的社区级干预措施(封禁和隔离)是否有效降低了受影响用户的不文明行为?
  • RQ5哪些社区级特征是子版块未来有害行为的最强预测因子?

主要发现

  • 子版块在用户群体和主题焦点方面均表现出较高的持续演变率,使静态监管无效。
  • 结构特征——尤其是与已知问题子版块的用户连通性——是预测子版块未来有害行为的最强预测因子。
  • 加入仇恨子版块会显著降低用户在其他社区中的文明程度,即使控制了基线行为。
  • 封禁和隔离等社区级干预措施未能逆转用户文明程度的下降,表明其效果有限。
  • 最终被封禁或隔离的子版块的演化模式与稳定或非有害社区明显不同,使其具备早期检测的可能性。
  • 可解释的机器学习模型在预测未来有害行为方面取得了合理高的准确率,为主动监管提供了科学依据。

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。