[论文解读] The Velocity of Censorship: High-Fidelity Detection of Microblog Post Deletions
本文通过系统性地长期监测敏感用户的内容,提出了一种高保真度的方法来检测新浪微博上的微博帖子删除行为。研究发现,30%的删除发生在发帖后5至30分钟内,90%在24小时内完成,表明存在由关键词过滤和关注度感知系统驱动的快速、时间敏感的审查机制。
Weibo and other popular Chinese microblogging sites are well known for exercising internal censorship, to comply with Chinese government requirements. This research seeks to quantify the mechanisms of this censorship: how fast and how comprehensively posts are deleted.Our analysis considered 2.38 million posts gathered over roughly two months in 2012, with our attention focused on repeatedly visiting "sensitive" users. This gives us a view of censorship events within minutes of their occurrence, albeit at a cost of our data no longer representing a random sample of the general Weibo population. We also have a larger 470 million post sampling from Weibo's public timeline, taken over a longer time period, that is more representative of a random sample. We found that deletions happen most heavily in the first hour after a post has been submitted. Focusing on original posts, not reposts/retweets, we observed that nearly 30% of the total deletion events occur within 5- 30 minutes. Nearly 90% of the deletions happen within the first 24 hours. Leveraging our data, we also considered a variety of hypotheses about the mechanisms used by Weibo for censorship, such as the extent to which Weibo's censors use retrospective keyword-based censorship, and how repost/retweet popularity interacts with censorship. We also used natural language processing techniques to analyze which topics were more likely to be censored.
研究动机与目标
- 量化中国微博平台(如新浪微博)上审查的速度与范围。
- 研究发帖后帖子删除的时间动态特征。
- 评估审查是否由关键词匹配驱动,还是受帖子热度影响。
- 利用自然语言处理技术分析哪些主题更可能被审查。
- 对比针对敏感用户的定向监控与更广泛的公共时间线采样结果。
提出的方法
- 研究者通过在两个月内反复查询一组敏感用户,收集了238万条帖子,实现了对删除事件的近实时检测。
- 采用第二个更大规模的数据集(4.7亿条公共时间线帖子)以在更具代表性样本中验证研究发现。
- 通过定期重复查询并比较帖子的可用性,识别出删除事件。
- 应用自然语言处理技术对主题进行分类,并评估不同主题的审查频率。
- 分析帖子传播程度(通过转发量衡量)与删除可能性之间的相互作用。
- 使用统计模型检验关于事后关键词过滤和基于时间的删除模式的假设。
实验结果
研究问题
- RQ1微博帖子在发布后多快会被删除?
- RQ2审查在多大程度上由关键词匹配驱动,而非帖子热度?
- RQ3某些主题是否比其他主题更可能被审查?
- RQ4原始帖子与转发帖子的删除率有何差异?
- RQ5与公共时间线采样相比,针对敏感用户的定向监控在多大程度上具有代表性?
主要发现
- 近30%的删除事件发生在帖子初始发布后的5至30分钟内。
- 约90%的删除事件在发帖后24小时内完成。
- 原始帖子的删除速度显著快于转发帖子,表明存在不同的审核策略。
- 审查具有高度时间敏感性,大多数删除集中在发帖后第一个小时内。
- 与政治、社会动荡及敏感历史事件相关的主题被不成比例地重点审查。
- 本研究证实,新浪微博同时采用实时关键词过滤和事后内容分析,后者发挥了显著作用。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。