QUICK REVIEW

[Paper Review] Defending Model Inversion and Membership Inference Attacks via Prediction Purification

Ziqi Yang, Bin Shao|arXiv (Cornell University)|May 8, 2020

Adversarial Robustness in Machine Learning71 references50 citations

TL;DR

The paper introduces a unified purification framework that purifies target-model prediction scores to defend against model inversion and membership inference attacks, using an autoencoder purifier with optional adversarial components for specialization.

ABSTRACT

Neural networks are susceptible to data inference attacks such as the model inversion attack and the membership inference attack, where the attacker could infer the reconstruction and the membership of a data sample from the confidence scores predicted by the target classifier. In this paper, we propose a unified approach, namely purification framework, to defend data inference attacks. It purifies the confidence score vectors predicted by the target classifier by reducing their dispersion. The purifier can be further specialized in defending a particular attack via adversarial learning. We evaluate our approach on benchmark datasets and classifiers. We show that when the purifier is dedicated to one attack, it naturally defends the other one, which empirically demonstrates the connection between the two attacks. The purifier can effectively defend both attacks. For example, it can reduce the membership inference accuracy by up to 15% and increase the model inversion error by a factor of up to 4. Besides, it incurs less than 0.4% classification accuracy drop and less than 5.5% distortion to the confidence scores.

Motivation & Objective

Motivate and unify defense against two data-inference attacks: model inversion and membership inference.
Reduce dispersion of confidence score vectors to lower attack efficacy.
Preserve classifier utility with negligible accuracy loss and limited score distortion.
Enable specialization of the purifier for individual attacks via adversarial learning.
Demonstrate empirical effectiveness across benchmark datasets and architectures.

Proposed method

Introduce a purifier G (autoencoder) that reconstructs/confidence scores toward latent non-member patterns.
Train G on a reference non-member dataset to minimize reconstruction loss and preserve predicted labels.
Specialize G for model inversion via a min-max game with an adversarial model H that tries to reconstruct inputs from purified scores.
Specialize G for membership inference via a discriminator I that distinguishes real vs reconstructed scores, training G to fool I.
Optionally combine both specializations by jointly training G, H, and I to defend both attacks with preserved utility.

Experimental results

Research questions

RQ1Are model inversion and membership inference attacks connected, and can a single purification approach defend both?
RQ2Can prediction score purification reduce dispersion to mitigate both attacks while preserving classification accuracy?
RQ3What is the impact of specialized purification (via adversarial learning) on defense effectiveness against each attack?
RQ4How does the proposed purification framework compare to existing defenses in terms of accuracy loss and efficiency?

Key findings

Purification reduces dispersion of confidence score vectors, reducing attack efficacy for both attacks.
When specialized to one attack, the purifier naturally improves defense against the other attack.
Membership inference accuracy can be reduced by up to 15% using purification.
Model inversion error can be increased by up to a factor of 4.
Classification accuracy loss is under 0.4% and confidence score distortion under 5.5% with purification.
Prediction time is significantly faster than MemGuard (e.g., 4,636x faster in reported comparisons).

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.