QUICK REVIEW

[論文レビュー] Adversarial Perturbations Against Deep Neural Networks for Malware Classification

Kathrin Grosse, Nicolas Papernot|arXiv (Cornell University)|Jun 14, 2016

Advanced Malware Detection Techniques参考文献 15被引用数 320

ひとこと要約

本論文は、静的特徴を用いたAndroidマルウェア検知のニューラルネットワークに対する敵対的改変を実演し、離散で機能を保持する変更にも関わらず高い誤分類率を示し、蒸留や敵対的リトレーニングなどの防御を評価します。

ABSTRACT

Deep neural networks, like many other machine learning models, have recently been shown to lack robustness against adversarially crafted inputs. These inputs are derived from regular inputs by minor yet carefully selected perturbations that deceive machine learning models into desired misclassifications. Existing work in this emerging field was largely specific to the domain of image classification, since the high-entropy of images can be conveniently manipulated without changing the images' overall visual appearance. Yet, it remains unclear how such attacks translate to more security-sensitive applications such as malware detection - which may pose significant challenges in sample generation and arguably grave consequences for failure. In this paper, we show how to construct highly-effective adversarial sample crafting attacks for neural networks used as malware classifiers. The application domain of malware classification introduces additional constraints in the adversarial sample crafting problem when compared to the computer vision domain: (i) continuous, differentiable input domains are replaced by discrete, often binary inputs; and (ii) the loose condition of leaving visual appearance unchanged is replaced by requiring equivalent functional behavior. We demonstrate the feasibility of these attacks on many different instances of malware classifiers that we trained using the DREBIN Android malware data set. We furthermore evaluate to which extent potential defensive mechanisms against adversarial crafting can be leveraged to the setting of malware classification. While feature reduction did not prove to have a positive impact, distillation and re-training on adversarially crafted samples show promising results.

研究の動機と目的

マルウェア分類に用いられるニューラルネットワークの敵対的摂動下での頑健性を動機づけ、検討する。
画像から離散的でバイナリなマルウェア特徴へ敵対的改変技術を適用する。
マルウェアに特有の制約（離散的特徴、機能性の保持）が攻撃の実現可能性にどのように影響するかを評価する。
マルウェア分類器の防御戦略（特徴削減、蒸留、敵対的リトレーニング）を評価する。

提案手法

DREBINデータセットを用いて静的でバイナリな特徴ベクトルからアプリを抽出し、複数のフィードフォワード型ニューラルネットワークを訓練する。
アプリケーションを {0,1}^M の高次元バイナリ指標ベクトルとして表現し、二値分類（benign vs. malware）にはsoftmax出力を用いる。
前向き微分（Jacobian）を反復計算して、ターゲットクラスの確率を最も増やす特徴の追加を同定することで敵対的サンプルを作成する。ただし機能に影響を与えない特徴の追加のみを許す。
摂動をL1ノルム制約で制限し、最大 k 個の特徴を追加する（k = 20）ようにする。既存の特徴と干渉しない追加のみを許可する。
プログラム動作を保持するためにAndroidManifest.xml からのみ特徴を追加してマニフェストベースの変更に限定する。
訓練バッチにおけるネットワークアーキテクチャとマルウェア比率全体で敵対的サンプルの誤分類率を評価する。

実験結果

リサーチクエスチョン

RQ1静的でバイナリなAndroidアプリ特徴で訓練したニューラルネットワークはDREBINデータセットで最先端のマルウェア検知性能を達成できるか。
RQ2離散的で機能を保持する特徴追加に制限された状態でも、マルウェア検知におけるニューラルネットワークは敵対的改変に対して頑健か。
RQ3防御戦略（蒸留、敵対的サンプルでのリトレーニング）の有効性は、マルウェア分類器の敵対的感受性を低減するうえでどの程度か。

主な発見

ニューラルネットワークはDREBINで約97–98%の精度を達成し、偽陰性が低く（≈7%）、偽陽性が低い（≈3–4%）。
敵対的改変はマルウェアサンプルの大部分を誤分類させることができ、アーキテクチャと設定に応じて約50%から84%の誤分類率を達成する、20特徴の変更制限の下で。
離散的領域では特徴削減は保護にならず、むしろ敵対的改変を援助する可能性がある。
蒸留は誤分類率を低減するが、効果は控えめ。
敵対的サンプルでのリトレーニングは耐性を高めるが、効果はハイパーパラメータの選択に依存する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。