QUICK REVIEW

[论文解读] Applying Social Media Intelligence for Predicting and Identifying On-line Radicalization and Civil Unrest Oriented Threats

Swati Agarwal, Ashish Sureka|arXiv (Cornell University)|Nov 21, 2015

Hate Speech and Cyberbullying Detection参考文献 76被引用 47

一句话总结

本文对100余项关于利用社交媒体情报检测网络极端化与社会动荡威胁的研究进行了全面的文献综述，分析了自然语言处理、机器学习及社交网络分析等技术。研究识别出Twitter和YouTube是极端化与抗议动员的主要平台，聚类、逻辑回归和命名实体识别为关键方法，并指出了在多语言及区域威胁检测方面存在的显著研究空白。

ABSTRACT

Research shows that various social media platforms on Internet such as Twitter, Tumblr (micro-blogging websites), Facebook (a popular social networking website), YouTube (largest video sharing and hosting website), Blogs and discussion forums are being misused by extremist groups for spreading their beliefs and ideologies, promoting radicalization, recruiting members and creating online virtual communities sharing a common agenda. Popular microblogging websites such as Twitter are being used as a real-time platform for information sharing and communication during planning and mobilization if civil unrest related events. Applying social media intelligence for predicting and identifying online radicalization and civil unrest oriented threats is an area that has attracted several researchers' attention over past 10 years. There are several algorithms, techniques and tools that have been proposed in existing literature to counter and combat cyber-extremism and predicting protest related events in much advance. In this paper, we conduct a literature review of all these existing techniques and do a comprehensive analysis to understand state-of-the-art, trends and research gaps. We present a one class classification approach to collect scholarly articles targeting the topics and subtopics of our research scope. We perform characterization, classification and an in-depth meta analysis meta-anlaysis of about 100 conference and journal papers to gain a better understanding of existing literature.

研究动机与目标

分析社交媒体情报在检测网络极端化与社会动荡威胁方面的最新进展。
识别现有研究中用于极端主义与抗议预测的主导技术、平台及特征。
揭示在多语言内容分析、区域特异性及评估方法论方面的主要研究空白。
评估机器学习与自然语言处理（NLP）方法在检测极端主义内容及预测动荡事件方面的有效性。
对100余篇学术论文进行元分析，以指导情报与安全信息学领域的未来研究。

提出的方法

采用单类分类方法，收集并筛选聚焦于网络极端化与社会动荡检测的学术论文。
对来自ISI和SI会议及期刊的100篇论文进行特征分析、分类与元分析。
识别并分类关键技术，包括聚类、逻辑回归、动态查询扩展及命名实体识别（NER）。
分析时空元数据、上下文元数据及用户资料特征（如活动动态、人口统计信息）在威胁检测中的应用。
评估图建模与社交网络分析（SNA）在识别隐藏极端主义群体中的作用。
评估各类研究中使用的评估方法，如精确率、F1值及社区检测性能。

实验结果

研究问题

RQ1在社交媒体上检测网络极端化与社会动荡威胁时，最常使用的技术和平台是什么？
RQ2机器学习与自然语言处理（NLP）方法在识别极端主义内容及预测抗议事件方面的有效性如何？
RQ3哪些类型的元数据与特征（如时空、语言、基于网络的特征）对极端化或动荡最具预测性？
RQ4现有研究在区域与语言范围上如何变化，特别是在多语言或非英语语境下？
RQ5当前方法中存在哪些主要研究空白，特别是在区域特异性与评估严谨性方面？

主要发现

由于其实时性、短文本特性及基于关注者的传播机制，Twitter是社会动荡预测中最广泛使用的平台。
尽管在抗议预测研究中利用不足，YouTube仍是网络极端化的主要平台。
命名实体识别（NER）是极端化与动荡检测文本处理流程中的常见组件。
90%的研究聚焦于英语内容，表明在多语言及非英语文本分析方面存在显著空白。
60%的研究针对特定国家或地区的事件，其中大多数集中于拉丁美洲与美国。
精确率是最常用的评估指标，社交网络分析（SNA）常用于检测隐藏的极端主义群体。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。