QUICK REVIEW

[論文レビュー] BitcoinHeist: Topological Data Analysis for Ransomware Detection on the Bitcoin Blockchain

Cüneyt Gürcan Akçora, Yitao Li|arXiv (Cornell University)|Jun 19, 2019

Topological and Geometric Data Analysis参考文献 29被引用数 46

ひとこと要約

本論文は、Mapperを用いたトポロジカルデータ解析フレームワークを、ヘテロジニアスなBitcoin取引グラフ上で用い、ランサムウェアに関連するアドレスを自動検出し、新たなランサムウェアファミリーを予測する。ヒューリスティック手法より精度と再現率を向上させる。

ABSTRACT

Proliferation of cryptocurrencies (e.g., Bitcoin) that allow pseudo-anonymous transactions, has made it easier for ransomware developers to demand ransom by encrypting sensitive user data. The recently revealed strikes of ransomware attacks have already resulted in significant economic losses and societal harm across different sectors, ranging from local governments to health care. Most modern ransomware use Bitcoin for payments. However, although Bitcoin transactions are permanently recorded and publicly available, current approaches for detecting ransomware depend only on a couple of heuristics and/or tedious information gathering steps (e.g., running ransomware to collect ransomware related Bitcoin addresses). To our knowledge, none of the previous approaches have employed advanced data analytics techniques to automatically detect ransomware related transactions and malicious Bitcoin addresses. By capitalizing on the recent advances in topological data analysis, we propose an efficient and tractable data analytics framework to automatically detect new malicious addresses in a ransomware family, given only a limited records of previous transactions. Furthermore, our proposed techniques exhibit high utility to detect the emergence of new ransomware families, that is, ransomware with no previous records of transactions. Using the existing known ransomware data sets, we show that our proposed methodology provides significant improvements in precision and recall for ransomware transaction detection, compared to existing heuristic based approaches, and can be utilized to automate ransomware detection.

研究の動機と目的

スケーラブルなデータ駆動特徴を用いて、ランサムウェア支払いに関連するBitcoinアドレスを識別する。
既知のランサウェアファミリーが時間とともに一貫したブロックチェーンの挙動を示すかを評価する。
Bitcoinグラフ上で異なるランサムウェア運用者の挙動の類似性を評価する。
未開示のランサウェア支払いの検出と新規ランサウェアファミリーの出現を可能にする。

提案手法

Bitcoinブロックチェーンをヘテロジニアスな二種類グラフ（アドレスと取引）としてモデル化する。
各アドレスについて24時間ウィンドウ内の6つの時系列グラフ特徴を抽出する（収入、隣接、ウェイト、長さ、カウント、ループ）。
過去のラベル付きデータを用いて将来の未知データを予測する、既存ファミリ検出と新規ファミリ予測の2つの問題を定式化する。
Mapperベースのトポロジカルデータ解析を適用してクラスタグラフ表現を作成し、多属性フィルタリング（アルゴリズム1）によって疑わしいアドレスを同定する。
特徴量を標準化し、ヒューリスティックなベースライン（共支出と遷移）および従来のクラスタリング手法（DBSCAN、階層、XGBoost、ランダムフォレスト）と比較する。
24時間ウィンドウアプローチを用いて取引の時空間ダイナミクスを捉え、スケールでデータを管理する。

実験結果

リサーチクエスチョン

RQ1ビットコインネットワークからどの特徴がランサムウェアの挙動を最もよく検出するか？
RQ2特定のランサムウェアファミリーは時間を通じて一貫したブロックチェーン挙動を示すか？
RQ3ビットコインブロックチェーン上で異なるランサウェア運用者の挙動はどれくらい類似しているか？
RQ4権限当局や分析企業に報告されていない支払いを検出できるか？
RQ5既存データから新しいランサウェアファミリーの出現を検出できるか？

主な発見

本フレームワークは、ヒューリスティック手法と比べてランサムウェア取引検出の精度と再現率を向上させる。
既知のランサウェアファミリーのアドレスへの未開示支払いを検出し、新規ランサウェアファミリーの出現を予測できる。
モントリオール、プリンストン、パドゥアのソースを組み合わせた大規模なランサムウェアデータセットには、時間を通して複数回出現する27ファミリーの24,486アドレスが含まれる。
アドレスごとに6つのグラフ特徴を計算・分析。上位のランサムウェアパターンは、非ランサムウェアのパターンと統計的に異なる（p < 2.2e-16）。
MapperベースのTDAは、従来のクラスタリングを超えたアドレスクラスタ間の隠れた結びつきを明らかにし、アドレスの標的疑いスコアリングを可能にする（アルゴリズム1）。
本研究は2009–2018年のBitcoinデータに対して日次の24時間ウィンドウを用い、アドレスの再発と挙動を検討する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。