QUICK REVIEW

[Paper Review] Generative Poisoning Attack Method Against Neural Networks

Chaofei Yang, Qing Wu|arXiv (Cornell University)|Mar 3, 2017

Adversarial Robustness in Machine Learning16 references148 citations

TL;DR

The paper presents a generative method to create poisoned inputs for neural networks to degrade accuracy, significantly speeding up data poisoning (up to 239.38× faster) compared to direct gradient methods, with MNIST and CIFAR-10 experiments and a loss-based countermeasure.

ABSTRACT

Poisoning attack is identified as a severe security threat to machine learning algorithms. In many applications, for example, deep neural network (DNN) models collect public data as the inputs to perform re-training, where the input data can be poisoned. Although poisoning attack against support vector machines (SVM) has been extensively studied before, there is still very limited knowledge about how such attack can be implemented on neural networks (NN), especially DNNs. In this work, we first examine the possibility of applying traditional gradient-based method (named as the direct gradient method) to generate poisoned data against NNs by leveraging the gradient of the target model w.r.t. the normal data. We then propose a generative method to accelerate the generation rate of the poisoned data: an auto-encoder (generator) used to generate poisoned data is updated by a reward function of the loss, and the target NN model (discriminator) receives the poisoned data to calculate the loss w.r.t. the normal data. Our experiment results show that the generative method can speed up the poisoned data generation rate by up to 239.38x compared with the direct gradient method, with slightly lower model accuracy degradation. A countermeasure is also designed to detect such poisoning attack methods by checking the loss of the target model.

Motivation & Objective

Investigate the feasibility of poisoning attacks on neural networks using gradient-based methods.
Develop a generative (autoencoder-based) approach to accelerate poisoned data generation.
Compare the generative method with direct gradient attack in terms of speed and impact on model accuracy.
Propose a low-overhead loss-based countermeasure to detect poisoning attacks.
Evaluate effectiveness on MNIST and CIFAR-10 datasets.

Proposed method

Analyze direct gradient poisoning by computing gradients w.r.t. poisoned data and updating poisoned inputs via gradient ascent.
Introduce a generator (autoencoder) that produces poisoned data and is updated by a reward function derived from loss differences.
Use the discriminator (target NN) to compute losses and gradients sent back to the generator, implicitly handling second-order derivatives.
Formulate Algorithm 1 for the direct gradient method and Algorithm 2 for the generative method, reducing explicit second-derivative calculations.
Design a reward function based on the difference of losses across consecutive attacks to train the generator.
Propose a loss-based countermeasure (Algorithm 3) that triggers an alarm when input-induced losses exceed a threshold.

Experimental results

Research questions

RQ1Can poisoning attacks be effectively executed on neural networks using gradient-based methods?
RQ2Does a generative (autoencoder-based) approach significantly accelerate poisoned data generation compared to the direct gradient method?
RQ3How does the attack affect model accuracy on standard datasets like MNIST and CIFAR-10?
RQ4Can a low-overhead loss-based detector reliably identify poisoning inputs during training?

Key findings

The generative method speeds up poisoned data generation by up to 239.38× on CIFAR-10 and improves scalability to larger networks compared to the direct gradient method.
On MNIST, the best generative method achieves an accuracy degradation to 16.59% (vs. 8.84% for direct gradient) under 1000-group settings, while still showcasing substantial speed gains.
On CIFAR-10, the generative method shows similar or better attack effectiveness with much lower time overhead, especially as dataset size grows.
The direct gradient method is time-consuming and scales with input dimension and model complexity, while the generative method mitigates this bottleneck.
A loss-based countermeasure detects poisoning by monitoring loss spikes; excessive warnings can trigger accuracy checks to identify attacks with low overhead.
Experiments demonstrate that poisoning attacks degrade target model performance and that the generator-guided approach is more scalable for larger networks.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.