QUICK REVIEW

[論文レビュー] Fast & Furious: Modelling Malware Detection as Evolving Data Streams

Fabrício Ceschin, Marcus Botacin|arXiv (Cornell University)|May 24, 2022

Advanced Malware Detection Techniques被引用数 1

ひとこと要約

本稿では、概念ずれおよび特徴ずれに応じて分類器と特徴抽出器を同時に最適化する、Androidマルウェア検出のための新規データストリーム学習パイプラインを提案する。DREBINおよびAndroZooの2009年～2018年の41.5万個のAndroidアプリを対象に、Word2VecおよびTF-IDF特徴量を用いて学習することで、ずれの発生時に両方のコンponentを更新する手法が、DREBINではF1スコアを22.05ポイント、AndroZooでは8.77ポイント向上させ、静的モデルや分類器のみの更新手法を上回ることを示した。

ABSTRACT

Malware is a major threat to computer systems and imposes many challenges to cyber security. Targeted threats, such as ransomware, cause millions of dollars in losses every year. The constant increase of malware infections has been motivating popular antiviruses (AVs) to develop dedicated detection strategies, which include meticulously crafted machine learning (ML) pipelines. However, malware developers unceasingly change their samples' features to bypass detection. This constant evolution of malware samples causes changes to the data distribution (i.e., concept drifts) that directly affect ML model detection rates, something not considered in the majority of the literature work. In this work, we evaluate the impact of concept drift on malware classifiers for two Android datasets: DREBIN (about 130K apps) and a subset of AndroZoo (about 285K apps). We used these datasets to train an Adaptive Random Forest (ARF) classifier, as well as a Stochastic Gradient Descent (SGD) classifier. We also ordered all datasets samples using their VirusTotal submission timestamp and then extracted features from their textual attributes using two algorithms (Word2Vec and TF-IDF). Then, we conducted experiments comparing both feature extractors, classifiers, as well as four drift detectors (DDM, EDDM, ADWIN, and KSWIN) to determine the best approach for real environments. Finally, we compare some possible approaches to mitigate concept drift and propose a novel data stream pipeline that updates both the classifier and the feature extractor. To do so, we conducted a longitudinal evaluation by (i) classifying malware samples collected over nine years (2009-2018), (ii) reviewing concept drift detection algorithms to attest its pervasiveness, (iii) comparing distinct ML approaches to mitigate the issue, and (iv) proposing an ML data stream pipeline that outperformed literature approaches.

研究の動機と目的

概念ずれが、孤立した事例に限らない、Androidマルウェアデータセット全体に一般化された現象であるかどうかを調査すること。
概念ずれが発生した際、分類器に加えて特徴抽出器の更新が、高い検出精度を長期間維持するために必要かどうかを評価すること。
長期的なマルウェア検出において、最適な特徴抽出法、分類法、ずれ検出法の組み合わせを特定すること。
進化するマルウェアに起因する性能劣化を緩和する、リアルタイムで適応可能なマルウェア検出パイプラインを提案・検証すること。

提案手法

著者らは、VirusTotalへの提出日時を基準に、DREBINおよびAndroZooの41.5万個のAndroidアプリを収集・順序付けし、実世界のデータストリームを模擬する。
テキスト的属性（例：権限、APIコール）は、TF-IDFおよびWord2Vecの2つの特徴表現手法を用いて抽出する。
概念ずれの検出に応じて更新される2つの分類器（適応的ランダムフォレスト（ARF）および確率的勾配降下法（SGD））を訓練する。
4種類のずれ検出器（DDM、EDDM、ADWIN、KSWIN）を評価し、進化するデータストリームにおける概念ずれを特定する。
提案されたパイプラインは、ずれ検出時に分類器と特徴抽出器の両方を動的に更新することで、表現と予測の両方の適応性を確保する。
本システムはscikit-multiflowへの拡張として実装されており、今後の研究の再現性と拡張性を支援する。

実験結果

リサーチクエスチョン

RQ1概念ずれは、特定のデータ分布に限定されるのではなく、多様なAndroidマルウェアデータセット全体に一般化された現象であるか？
RQ2概念ずれが発生した際、分類器に加えて特徴抽出器の更新が、長期間にわたる高い検出精度を維持するために不可欠であるか？
RQ3特徴抽出法、分類法、ずれ検出法のどの組み合わせが、マルウェア検出において最良の長期的パフォーマンスをもたらすか？
RQ4モデル更新のタイミング（ずれトリガー型 vs. 固定ウィンドウ型）が、検出パフォーマンスにどのように影響するか？
RQ5マルウェアの進化は、Androidエコシステムの変化とどの程度相関しているか？

主な発見

概念ずれを検出した後に分類器をリセットする手法は、固定時間ウィンドウに基づく定期的リセットよりも優れた性能を示す。
ずれの発生時に分類器と特徴抽出器の両方を更新することで、最高の検出性能が得られ、DREBINデータセットではF1スコアが22.05ポイント向上した。
大きなAndroZooデータセットに対しても、両方のコンponentを同時に適応させることによる改善効果は顕著であり、F1スコアは8.77ポイント上昇した。
KSWINずれ検出器は、DREBINおよびAndroZooの両データセットにおいて、DDM、EDDM、ADWINを上回る性能を示した。
本研究では、マルウェアの進化が、概念ずれと特徴ずれを引き起こすことが確認された。時間の経過とともに、新しいAPIコール、権限、語彙が出現した。
提案されたパイプラインは、scikit-multiflowへの拡張として公開されており、コミュニティによる採用や、適応型マルウェア検出分野のさらなる研究を促進する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。