QUICK REVIEW

[論文レビュー] Automated Ransomware Behavior Analysis: Pattern Extraction and Early Detection

Qian Chen, Sheikh Rabiul Islam|arXiv (Cornell University)|Jan 1, 2019

Advanced Malware Detection Techniques参考文献 12被引用数 3

ひとこと要約

本論文では、システムログから特徴を抽出し、早期検出とフォレンジック可視化を可能にする自動化されたランサムウェア行動分析ツールを提示する。TF-IDF、フィッシャーのLDA、およびエキストラツリー（ET）機械学習モデルを用い、特徴抽出に成功した。ETは最も頑健で効率的であったが、TF-IDFが、さまざまな量の通常ログにおいても主要なマルウェアパターンを最もよく特定した。

ABSTRACT

Security operation centers (SOCs) typically use a variety of tools to collect large volumes of host logs for detection and forensic of intrusions. Our experience, supported by recent user studies on SOC operators, indicates that operators spend ample time (e.g., hundreds of man-hours) on investigations into logs seeking adversarial actions. Similarly, reconfiguration of tools to adapt detectors for future similar attacks is commonplace upon gaining novel insights (e.g., through internal investigation or shared indicators). This paper presents an automated malware pattern-extraction and early detection tool, testing three machine learning approaches: TF-IDF (term frequency-inverse document frequency), Fisher's LDA (linear discriminant analysis) and ET (extra trees/extremely randomized trees) that can (1) analyze freshly discovered malware samples in sandboxes and generate dynamic analysis reports (host logs); (2) automatically extract the sequence of events induced by malware given a large volume of ambient (un-attacked) host logs, and the relatively few logs from hosts that are infected with potentially polymorphic malware; (3) rank the most discriminating features (unique patterns) of malware and from the learned behavior detect malicious activity; and (4) allows operators to visualize the discriminating features and their correlations to facilitate malware forensic efforts. To validate the accuracy and efficiency of our tool, we design three experiments and test seven ransomware attacks (i.e., WannaCry, DBGer, Cerber, Defray, GandCrab, Locky, and nRansom). The experimental results show that TF-IDF is the best of the three methods to identify discriminating features, and ET is the most time-efficient and robust approach.

研究の動機と目的

特にサイバーセキュリティリソースが限られた中小企業に深刻な脅威となっているランサムウェアの脅威に対処すること。
現在、1件のインシデントあたり何百時間もかかるセキュリティオペレーションセンター（SOC）環境における手動フォレンジック作業を削減すること。
システムログから悪意のある行動パターンを自動抽出し、ランサムウェアの早期検出を可能にすること。
判別特徴とその相関関係を可視化し、マルウェアフォレンジックスと対応計画の支援を行うこと。
手動のリバースエンジニアリングやアナリストの専門知識に依存しない、スケーラブルで自動化されたソリューションの開発

提案手法

Cuckoo Sandboxを用いて、7つのランサムウェアサンプル（WannaCry、DBGer、Cerber、Defray、GandCrab、Locky、nRansom）とエミュレートされた通常ユーザー行動から動的分析ログを生成した。
TF-IDF、フィッシャーのLDA、およびエキストラツリー（ET）を用い、マルウェア由来の行動から判別特徴を抽出した。
感染済みホストログ（悪意のあるもの）と非感染ホストログ（通常のもの）を混在させて訓練し、特徴の差を特定した。
モデル固有の重み（例：TF-IDFスコア、LDAのクラス分離性、ETの特徴重要度）を用いて特徴を判別力の順にランク付けした。
ETモデル内の意思決定経路を可視化し、特徴の相関関係と階層的検出論理を明らかにした。
3つの実験により性能を検証した：特徴ランク付けの頑健性、モデル比較、未観測ログにおける早期検出精度

実験結果

リサーチクエスチョン

RQ1機械学習モデルは、システムログからランサムウェア行動と通常動作を区別するための最も判別力のある行動パターンを自動で抽出できるか？
RQ2TF-IDF、フィッシャーのLDA、ETは、さまざまな量の通常システムログにおいて、悪意のある特徴を特定・ランク付けする上でどのように比較されるか？
RQ3訓練データに異なる量と品質の通常ホストログが含まれる場合、特徴ランク付けは安定的で頑健か？
RQ4ETモデルは、暗号化が発生する前に対象となるランサムウェア行動を、高い正確性と妥当な再現率で検出できるか？
RQ5意思決定木の可視化は、ランサムウェア行動のフォレンジックス的理解を向上させ、対応計画を支援できるか？

主な発見

TF-IDFは、ランサムウェア行動の判別に最も適した特徴セットを特定する点で、フィッシャーのLDAおよびETを上回った。
ETは最も頑健なモデルであり、さまざまな量の通常システムログ（C1、C2、C3シナリオ）において一貫した特徴ランク付けを維持した。
フィッシャーのLDAは、通常ログの量が異なると著しく異なる特徴ランク付けを示し、低さの頑健性を示した。
ETモデルは、全7つのランサムウェアサンプルにおいて正確性（1.0）を達成し、早期検出における誤検出が一切なかったことを示した。
GandCrabは最高の検出精度（0.999）とFスコア（0.999）を記録したが、DBGerは再現率が最も低く（0.308）だったため、一部のバージョンでは検出に課題があった。
ET意思決定木の可視化により、悪意ある行動の順序とその相関関係が明確に浮き彫りになり、フォレンジックス分析に貢献した。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。