[Paper Review] Characterizing Adversarial Subspaces Using Local Intrinsic Dimensionality
Introduces Local Intrinsic Dimensionality (LID) to characterize adversarial regions in DNNs and demonstrates LID-based detection can outperform KD/BU detectors across several attacks and datasets.
Deep Neural Networks (DNNs) have recently been shown to be vulnerable against adversarial examples, which are carefully crafted instances that can mislead DNNs to make errors during prediction. To better understand such attacks, a characterization is needed of the properties of regions (the so-called 'adversarial subspaces') in which adversarial examples lie. We tackle this challenge by characterizing the dimensional properties of adversarial regions, via the use of Local Intrinsic Dimensionality (LID). LID assesses the space-filling capability of the region surrounding a reference example, based on the distance distribution of the example to its neighbors. We first provide explanations about how adversarial perturbation can affect the LID characteristic of adversarial regions, and then show empirically that LID characteristics can facilitate the distinction of adversarial examples generated using state-of-the-art attacks. As a proof-of-concept, we show that a potential application of LID is to distinguish adversarial examples, and the preliminary results show that it can outperform several state-of-the-art detection measures by large margins for five attack strategies considered in this paper across three benchmark datasets. Our analysis of the LID characteristic for adversarial regions not only motivates new directions of effective adversarial defense, but also opens up more challenges for developing new attacks to better understand the vulnerabilities of DNNs.
Motivation & Objective
- Motivate a dimensionality-based understanding of adversarial regions within DNN representations.
- Propose and define Local Intrinsic Dimensionality (LID) for local distance distributions.
- Empirically show LID distinguishes adversarial from normal/noisy data across layers and attacks.
- Demonstrate LID-based detectors can outperform existing KD and BU detectors on multiple datasets and attacks.
- Discuss implications for adversarial defense and attack analysis.
Proposed method
- Define LID based on the local growth of the distance distribution around a reference point.
- Estimate LID using a maximum likelihood estimator on the k nearest neighbors (MLE formula in Eq. 4).
- Compute LID across all transformation layers of a DNN using activations as features.
- Generate adversarial and noisy counterparts for training data to build an LID-based detector.
- Train a logistic regression classifier using LID-based features to separate adversarial from normal/noisy samples.
- Evaluate detectors against five attacks (FGM, BIM-a, BIM-b, JSMA, Opt) on MNIST, CIFAR-10, and SVHN.
Experimental results
Research questions
- RQ1Can LID capture the intrinsic dimensional properties of adversarial regions?
- RQ2Are LID-based features effective for distinguishing adversarial from normal and noisy inputs across multiple attacks and datasets?
- RQ3How does LID performance vary across DNN layers (convolutional vs. dense/softmax) and attacks?
- RQ4Do LID-based detectors generalize across different attack strategies?
Key findings
| Dataset | Feature | FGM | BIM-a | BIM-b | JSMA | Opt |
|---|---|---|---|---|---|---|
| MNIST | KD | 78.12 | 98.14 | 98.61 | 68.77 | 95.15 |
| MNIST | BU | 32.37 | 91.55 | 25.46 | 88.74 | 71.30 |
| MNIST | KD+BU | 82.43 | 99.20 | 98.81 | 90.12 | 95.35 |
| MNIST | LID | 96.89 | 99.60 | 99.83 | 92.24 | 99.24 |
| CIFAR-10 | KD | 64.92 | 68.38 | 98.70 | 85.77 | 91.35 |
| CIFAR-10 | BU | 70.53 | 81.60 | 97.32 | 87.36 | 91.39 |
| CIFAR-10 | KD+BU | 70.40 | 81.33 | 98.90 | 88.91 | 93.77 |
| CIFAR-10 | LID | 82.38 | 82.51 | 99.78 | 95.87 | 98.94 |
| SVHN | KD | 70.39 | 77.18 | 99.57 | 86.46 | 87.41 |
| SVHN | BU | 86.78 | 84.07 | 86.93 | 91.33 | 87.13 |
| SVHN | KD+BU | 86.86 | 83.63 | 99.52 | 93.19 | 90.66 |
| SVHN | LID | 97.61 | 87.55 | 99.72 | 95.07 | 97.60 |
- LID estimates for adversarial examples are consistently higher than those for normal or noisy examples, especially in deeper layers.
- LID-based detectors outperform KD and BU detectors across all tested attacks and datasets, with Opt attack achieving 99.24% AUC on MNIST.
- LID-based discrimination remains robust across different network layers and shows stronger separation in deeper layers.
- Detectors trained on simple attacks (e.g., FGM) can generalize to detect more complex attacks.
- LID is more stable to parameter variations than KD, and requires dataset-specific tuning.
- Adversarial regions across attacks share similar dimensional properties, enabling cross-attack detectability.
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.