Skip to main content
QUICK REVIEW

[论文解读] A Boundary Tilting Persepective on the Phenomenon of Adversarial Examples

Thomas Tanay, Lewis D. Griffin|arXiv (Cornell University)|Aug 27, 2016
Adversarial Robustness in Machine Learning参考文献 9被引用 136
一句话总结

该论文批评对抗样本的线性解释,并提出一个边界倾斜框架,显示对抗强度如何取决于从最近质心边界和数据子流形的偏离程度。它还将对抗强度与正则化联系起来,并提供带有 SVM 实验的线性分类分析。

ABSTRACT

Deep neural networks have been shown to suffer from a surprising weakness: their classification outputs can be changed by small, non-random perturbations of their inputs. This adversarial example phenomenon has been explained as originating from deep networks being "too linear" (Goodfellow et al., 2014). We show here that the linear explanation of adversarial examples presents a number of limitations: the formal argument is not convincing, linear classifiers do not always suffer from the phenomenon, and when they do their adversarial examples are different from the ones affecting deep networks. We propose a new perspective on the phenomenon. We argue that adversarial examples exist when the classification boundary lies close to the submanifold of sampled data, and present a mathematical analysis of this new perspective in the linear case. We define the notion of adversarial strength and show that it can be reduced to the deviation angle between the classifier considered and the nearest centroid classifier. Then, we show that the adversarial strength can be made arbitrarily high independently of the classification performance due to a mechanism that we call boundary tilting. This result leads us to defining a new taxonomy of adversarial examples. Finally, we show that the adversarial strength observed in practice is directly dependent on the level of regularisation used and the strongest adversarial examples, symptomatic of overfitting, can be avoided by using a proper level of regularisation.

研究动机与目标

  • Motivate a shift from purely linear explanations of adversarial examples to a boundary tilting perspective.
  • Characterise when adversarial examples exist by considering the alignment between decision boundaries and data submanifolds.
  • Quantify adversarial strength in linear models and relate it to deviation from the nearest centroid classifier.
  • Investigate how boundary tilting and regularisation influence adversarial strength and model robustness.
  • Propose a taxonomy of adversarial examples based on boundary geometry and data distribution.

提出的方法

  • Define a strict non-existence condition for adversarial examples using a linear classifier boundary and mirror images.
  • Introduce the strength measure s(I, C) = arctan(||j - m(i, C)|| / ||i - m(i, C)||) and show it reduces to the deviation angle delta_c between C and the nearest-centroid boundary B.
  • Express classifier deviation as c = cos(delta_c) b + sin(delta_c) b_perp_c and derive s(I, C) and s(J, C) formulas depending on delta_c and rc = c0/||i||.
  • Demonstrate that boundary tilting can yield arbitrarily strong adversarial examples without harming performance, via analysis of rc and delta_c.
  • Show that with high regularisation, adversarial strength shrinks toward the nearest centroid classifier; low regularisation increases overfitting and boundary tilting.
  • Provide experimental intuition using SVM to relate observed adversarial strength to regularisation levels.

实验结果

研究问题

  • RQ1Under what geometric conditions do adversarial examples exist when the data lie on a submanifold and the decision boundary lies near it?
  • RQ2How can adversarial strength be quantified in linear models, and what role does the deviation from the nearest centroid boundary play?
  • RQ3Can boundary tilting cause strong adversarial examples without sacrificing classification accuracy, and how does regularisation modulate this effect?
  • RQ4What is the relationship between boundary geometry (deviation angle) and the strength of adversarial examples across data distributions?

主要发现

  • Adversarial strength can be characterized by the deviation angle delta_c between a classifier’s boundary and the nearest centroid boundary.
  • When the boundary tilts along directions with low data variance, adversarial strength can become arbitrarily large without degrading classification performance.
  • The strength measure simplifies to s ≈ |delta_c| when rc ≈ 0, linking robustness directly to boundary alignment with data means.
  • Stronger regularisation reduces adversarial strength, pushing the classifier toward the nearest centroid, while weak regularisation fosters boundary tilting and stronger adversarial examples.
  • Experiments with linear models (SVM) suggest practical control of adversarial strength through regularisation, contrasting with deeper networks where perturbations can be imperceptible.

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。