Skip to main content
QUICK REVIEW

[论文解读] Crowdsourcing Cybersecurity: Cyber Attack Detection using Social Media

Rupinder Paul Khandpur, Taoran Ji|arXiv (Cornell University)|Feb 24, 2017
Network Security and Intrusion Detection参考文献 39被引用 61
一句话总结

一个无监督框架,使用社交媒体作为众包传感器来检测网络攻击(DDoS、数据泄露、账户劫持),通过基于依存关系树的模版和词嵌入动态扩展种子查询,在大规模 Twitter 数据上进行评估。

ABSTRACT

Social media is often viewed as a sensor into various societal events such as disease outbreaks, protests, and elections. We describe the use of social media as a crowdsourced sensor to gain insight into ongoing cyber-attacks. Our approach detects a broad range of cyber-attacks (e.g., distributed denial of service (DDOS) attacks, data breaches, and account hijacking) in an unsupervised manner using just a limited fixed set of seed event triggers. A new query expansion strategy based on convolutional kernels and dependency parses helps model reporting structure and aids in identifying key event characteristics. Through a large-scale analysis over Twitter, we demonstrate that our approach consistently identifies and encodes events, outperforming existing methods.

研究动机与目标

  • Motivate the use of open social media signals as a sensor for cyber-attacks and reduce detection latency.
  • Develop an unsupervised framework that maps limited seed triggers to expanded queries to detect events.
  • Model reporting structure of cyber-attacks in social media via dependency parses and word embeddings.
  • Evaluate the approach on large-scale Twitter data across three attack categories (DDOS, data breach, account hijacking).

提出的方法

  • Introduce Target Domain Generation to collect tweets syntactically and semantically similar to seed queries using a convolution tree kernel over dependency trees.
  • Propose Dynamic Typed Query Expansion that iteratively expands seed queries by selecting candidate expansions via KL divergence to distinguish target domain from the global tweet collection.
  • Represent events as (Q_e, date, type) where Q_e is a set of expanded queries tied to a cyber-attack type.
  • Cluster exemplars of expanded queries and annotate exemplars to attack types based on similarity to initial seeds.
  • Evaluate using a large GNIP Twitter dataset (Aug 2014–Oct 2016) with gold-standard reports from Hackmageddon and PrivacyRights.

实验结果

研究问题

  • RQ1Can a small set of seed typed dependency queries be expanded dynamically to cover a broad range of cyber-attack reports in social media?
  • RQ2Does a convolution-tree kernel plus word embedding-based similarity improve target domain generation over naive keyword methods?
  • RQ3How well can unsupervised, seed-driven query expansion detect and characterize data breaches, account hijackings, and DDoS events in Twitter?
  • RQ4What are the precision/recall trade-offs of the proposed method compared to a traditional burst-detection baseline?
  • RQ5Can detected events be matched to established ground-truth cyber-attack datasets to validate performance?

主要发现

  • The method achieves around 0.78 precision and 0.74 recall for data breaches and 0.80 precision with 0.45 recall for DDoS events, with account hijacking at 0.66 precision and 0.56 recall.
  • Recall is higher for data breaches (approximately 0.75) than for DDoS or account hijacking due to shorter signal lifecycles for those attacks.
  • Baseline Kleinberg burst-detection on fixed keywords yields lower alignment with ground truth compared to the typed dynamic query expansion approach.
  • The approach detects additional events not listed in gold-standard sources, indicating discovery of new cyber-attack reports from social media.
  • Case studies demonstrate detection of high-profile incidents (e.g., Ashley Madison data breach, Sony/Dyn DDoS, CentCom account hijacking) with interpretable expanded queries.

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。