[论文解读] Adversarial Perturbations Against Deep Neural Networks for Malware Classification
本文展示了在静态特征下对用于Android恶意软件检测的神经网络进行对抗性构造的研究,即使在离散、保持功能的修改的情况下也能获得较高的错分类率,并评估了像蒸馏和对抗性再训练等防御措施。
Deep neural networks, like many other machine learning models, have recently been shown to lack robustness against adversarially crafted inputs. These inputs are derived from regular inputs by minor yet carefully selected perturbations that deceive machine learning models into desired misclassifications. Existing work in this emerging field was largely specific to the domain of image classification, since the high-entropy of images can be conveniently manipulated without changing the images' overall visual appearance. Yet, it remains unclear how such attacks translate to more security-sensitive applications such as malware detection - which may pose significant challenges in sample generation and arguably grave consequences for failure. In this paper, we show how to construct highly-effective adversarial sample crafting attacks for neural networks used as malware classifiers. The application domain of malware classification introduces additional constraints in the adversarial sample crafting problem when compared to the computer vision domain: (i) continuous, differentiable input domains are replaced by discrete, often binary inputs; and (ii) the loose condition of leaving visual appearance unchanged is replaced by requiring equivalent functional behavior. We demonstrate the feasibility of these attacks on many different instances of malware classifiers that we trained using the DREBIN Android malware data set. We furthermore evaluate to which extent potential defensive mechanisms against adversarial crafting can be leveraged to the setting of malware classification. While feature reduction did not prove to have a positive impact, distillation and re-training on adversarially crafted samples show promising results.
研究动机与目标
- 动机并研究用于恶意软件分类的神经网络在对抗性扰动下的鲁棒性。
- 将来自图像的对抗性构造技术迁移到离散的二进制恶意软件特征。
- 评估恶意软件特有的约束(离散特征、功能保持)如何影响攻击的可行性。
- 评估用于恶意软件分类器的防御策略(特征降维、蒸馏、对抗性再训练)。
提出的方法
- 在使用DREBIN数据集从Android应用提取的静态二进制特征向量上,训练多个前馈神经网络。
- 将应用表示为高维二进制指示向量 X ∈ {0,1}^M,並对二分类(良性 vs. 恶意软件)使用 softmax 输出。
- 通过迭代计算前向导数(雅可比矩阵)来识别最能增加目标类别概率的特征添加,同时仅添加不影响功能的特征来构造对抗样本。
- 用 L1 范数的界限约束扰动,等同于最多添加 k 个特征(k = 20),并且仅允许添加不会干扰现有特征的特征。
- 将扰动限制为基于清单的修改,通过仅通过 AndroidManifest.xml 添加特征来保持程序行为。
- 在训练批次中跨网络架构和恶意软件比例评估对抗样本的错分类率。
实验结果
研究问题
- RQ1在 DREBIN 数据集上,用静态、二进制的 Android 应用特征训练的神经网络,是否能达到最先进的恶意软件检测性能?
- RQ2在恶意软件检测中受到离散、保持功能的特征添加约束时,神经网络对对抗性构造是否鲁棒?
- RQ3防御策略(蒸馏、对抗样本再训练)在降低恶意软件分类器对对抗性易感性方面的有效性如何?
主要发现
- 神经网络在 DREBIN 上达到约 97–98% 的准确率,假阴性低(约 7%),假阳性约 3–4%。
- 对抗性构造可以错误分类相当大比例的恶意样本,错分类率在约 50% 到 84% 之间,具体取决于体系结构和设置,在 20 个特征修改限制下。
- 特征降维在这个离散领域既不能提供保护,反而可能帮助对抗性构造。
- 蒸馏降低了错分类率,但提升有限。
- 对抗样本再训练提高了抗性,尽管有效性取决于超参数的选择。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。