QUICK REVIEW

[论文解读] Crowdsourcing Cybersecurity: Cyber Attack Detection using Social Media

Rupinder Paul Khandpur, Taoran Ji|arXiv (Cornell University)|Feb 24, 2017

Network Security and Intrusion Detection参考文献 39被引用 61

一句话总结

一个无监督框架，使用社交媒体作为众包传感器来检测网络攻击（DDoS、数据泄露、账户劫持），通过基于依存关系树的模版和词嵌入动态扩展种子查询，在大规模 Twitter 数据上进行评估。

ABSTRACT

Social media is often viewed as a sensor into various societal events such as disease outbreaks, protests, and elections. We describe the use of social media as a crowdsourced sensor to gain insight into ongoing cyber-attacks. Our approach detects a broad range of cyber-attacks (e.g., distributed denial of service (DDOS) attacks, data breaches, and account hijacking) in an unsupervised manner using just a limited fixed set of seed event triggers. A new query expansion strategy based on convolutional kernels and dependency parses helps model reporting structure and aids in identifying key event characteristics. Through a large-scale analysis over Twitter, we demonstrate that our approach consistently identifies and encodes events, outperforming existing methods.

研究动机与目标

Motivate the use of open social media signals as a sensor for cyber-attacks and reduce detection latency.
Develop an unsupervised framework that maps limited seed triggers to expanded queries to detect events.
Model reporting structure of cyber-attacks in social media via dependency parses and word embeddings.
Evaluate the approach on large-scale Twitter data across three attack categories (DDOS, data breach, account hijacking).

提出的方法

Introduce Target Domain Generation to collect tweets syntactically and semantically similar to seed queries using a convolution tree kernel over dependency trees.
Propose Dynamic Typed Query Expansion that iteratively expands seed queries by selecting candidate expansions via KL divergence to distinguish target domain from the global tweet collection.
Represent events as (Q_e, date, type) where Q_e is a set of expanded queries tied to a cyber-attack type.
Cluster exemplars of expanded queries and annotate exemplars to attack types based on similarity to initial seeds.
Evaluate using a large GNIP Twitter dataset (Aug 2014–Oct 2016) with gold-standard reports from Hackmageddon and PrivacyRights.

实验结果

研究问题

RQ1Can a small set of seed typed dependency queries be expanded dynamically to cover a broad range of cyber-attack reports in social media?
RQ2Does a convolution-tree kernel plus word embedding-based similarity improve target domain generation over naive keyword methods?
RQ3How well can unsupervised, seed-driven query expansion detect and characterize data breaches, account hijackings, and DDoS events in Twitter?
RQ4What are the precision/recall trade-offs of the proposed method compared to a traditional burst-detection baseline?
RQ5Can detected events be matched to established ground-truth cyber-attack datasets to validate performance?

主要发现

The method achieves around 0.78 precision and 0.74 recall for data breaches and 0.80 precision with 0.45 recall for DDoS events, with account hijacking at 0.66 precision and 0.56 recall.
Recall is higher for data breaches (approximately 0.75) than for DDoS or account hijacking due to shorter signal lifecycles for those attacks.
Baseline Kleinberg burst-detection on fixed keywords yields lower alignment with ground truth compared to the typed dynamic query expansion approach.
The approach detects additional events not listed in gold-standard sources, indicating discovery of new cyber-attack reports from social media.
Case studies demonstrate detection of high-profile incidents (e.g., Ashley Madison data breach, Sony/Dyn DDoS, CentCom account hijacking) with interpretable expanded queries.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。