Skip to main content
QUICK REVIEW

[论文解读] Blacklight: Defending Black-Box Adversarial Attacks on Deep Neural Networks.

Huiying Li, Shawn Shan|arXiv (Cornell University)|Jun 24, 2020
Adversarial Robustness in Machine Learning参考文献 56被引用 24
一句话总结

Blacklight 是一种针对黑盒对抗性攻击的新防御机制,通过为每个输入图像生成稳健的单向哈希指纹来检测恶意查询。这些指纹在图像微小扰动下几乎保持不变,从而能够在仅几轮查询内即实现对对抗性攻击查询的早期检测,同时对高效查询的强对抗性攻击和先进反制措施仍具有效性。

ABSTRACT

The vulnerability of deep neural networks (DNNs) to adversarial examples is well documented. Under the strong white-box threat model, where attackers have full access to DNN internals, recent work has produced continual advancements in defenses, often followed by more powerful attacks that break them. Meanwhile, research on the more realistic black-box threat model has focused almost entirely on reducing the query-cost of attacks, making them increasingly practical for ML models already deployed today. This paper proposes and evaluates Blacklight, a new defense against black-box adversarial attacks. Blacklight targets a key property of black-box attacks: to compute adversarial examples, they produce sequences of highly similar images while trying to minimize the distance from some initial benign input. To detect an attack, Blacklight computes for each query image a compact set of one-way hash values that form a probabilistic fingerprint. Variants of an image produce nearly identical fingerprints, and fingerprint generation is robust against manipulation. We evaluate Blacklight on 5 state-of-the-art black-box attacks, across a variety of models and classification tasks. While the most efficient attacks take thousands or tens of thousands of queries to complete, Blacklight identifies them all, often after only a handful of queries. Blacklight is also robust against several powerful countermeasures, including an optimal black-box attack that approximates white-box attacks in efficiency. Finally, Blacklight significantly outperforms the only known alternative in both detection coverage of attack queries and resistance against persistent attackers.

研究动机与目标

  • 应对日益增长的实用型黑盒对抗性攻击威胁,这些攻击利用查询效率来规避检测。
  • 在攻击过程中尽早检测出对抗性查询,防止模型遭受显著暴露。
  • 设计一种即使在逼近白盒效率的最优黑盒攻击下仍具有效性的防御机制。
  • 在检测覆盖范围和对持续攻击者的鲁棒性方面超越现有防御方案。
  • 提供轻量级、可扩展的解决方案,适用于机器学习模型的实际部署。

提出的方法

  • Blacklight 为每个输入图像生成一组紧凑的单向哈希值,形成对小范围对抗性扰动具有不变性的概率指纹。
  • 指纹生成机制设计为对图像操作具有鲁棒性,确保同一图像的不同变体(如对抗性扰动)产生几乎相同的指纹。
  • 当查询序列产生的指纹过于相似时触发检测,表明存在对抗性攻击。
  • 该方法基于假设:黑盒攻击会生成与原始良性输入高度相似的图像序列,以最小化距离。
  • 无需模型内部信息或内部梯度访问,适用于黑盒部署。
  • 该防御与底层模型架构和分类任务无关,具备广泛适用性。

实验结果

研究问题

  • RQ1轻量级、与查询无关的防御机制能否在极低查询暴露下,仍能早期检测出黑盒对抗性攻击?
  • RQ2Blacklight 对最先进的低查询黑盒攻击(模拟白盒效率)的防御效果如何?
  • RQ3在攻击者持续适应以规避检测的场景下,Blacklight 能否保持高检测准确率?
  • RQ4与现有防御相比,Blacklight 在检测覆盖范围和对高级反制措施的鲁棒性方面表现如何?
  • RQ5指纹机制在各种图像变换和对抗性扰动下是否仍保持鲁棒性?

主要发现

  • Blacklight 检测到了所评估的五种最先进黑盒攻击,通常在仅 2–5 次查询后即能识别出攻击。
  • 即使攻击被优化为最小化查询成本并模拟白盒行为,该防御仍保持高检测准确率。
  • Blacklight 在检测覆盖范围和对持续攻击者的抵抗能力方面,显著优于唯一已知的替代防御方案。
  • 该方法对实现近乎白盒查询效率的最优黑盒攻击仍具有效性。
  • 指纹机制对图像操作具有鲁棒性,确保在对抗性扰动下检测结果一致。
  • Blacklight 无需访问模型内部结构,可直接部署于真实世界、生产级别的系统中。

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。