QUICK REVIEW

[论文解读] Defending Model Inversion and Membership Inference Attacks via Prediction Purification

Ziqi Yang, Bin Shao|arXiv (Cornell University)|May 8, 2020

Adversarial Robustness in Machine Learning参考文献 71被引用 50

一句话总结

本文提出一个统一的净化框架，通过自动编码器净化目标模型的预测分数，以防御模型反演和成员推断攻击，采用带可选对抗组件的专门化自编码器净化器。

ABSTRACT

Neural networks are susceptible to data inference attacks such as the model inversion attack and the membership inference attack, where the attacker could infer the reconstruction and the membership of a data sample from the confidence scores predicted by the target classifier. In this paper, we propose a unified approach, namely purification framework, to defend data inference attacks. It purifies the confidence score vectors predicted by the target classifier by reducing their dispersion. The purifier can be further specialized in defending a particular attack via adversarial learning. We evaluate our approach on benchmark datasets and classifiers. We show that when the purifier is dedicated to one attack, it naturally defends the other one, which empirically demonstrates the connection between the two attacks. The purifier can effectively defend both attacks. For example, it can reduce the membership inference accuracy by up to 15% and increase the model inversion error by a factor of up to 4. Besides, it incurs less than 0.4% classification accuracy drop and less than 5.5% distortion to the confidence scores.

研究动机与目标

动机并统一针对两种数据推断攻击的防御：模型反演和成员推断。
降低置信度分数向量的离散度以降低攻击有效性。
在几乎无准确度损失和有限分数失真的前提下，保留分类器的实用性。
通过对抗学习实现净化器对单个攻击的专门化。
在基准数据集和体系结构上展示经验效用。

提出的方法

引入一个净化器G（自编码器），将预测分数重构为朝向潜在非成员模式的分布。
在参考的非成员数据集上训练G，以最小化重构损失并保持预测标签。
通过与试图从净化分数重构输入的对抗模型H组成的极小极大对局为模型反演对净化G进行专门化。
通过鉴别器I区分真实分数与重构分数来为成员推断对净化G进行专门化，训练G以欺骗I。
可选地通过联合训练G、H和I来同时结合两种专门化，以在保持实用性的同时防御两种攻击。

实验结果

研究问题

RQ1模型反演和成员推断攻击是否相关，以及是否可以用单一的净化方法同时防御两者？
RQ2预测分数净化是否能降低离散度以在保持分类准确性的同时缓解两种攻击？
RQ3通过对抗学习实现的专门化净化对每种攻击的防御效果有何影响？
RQ4提出的净化框架在准确性损失和效率方面相对于现有防御的表现如何？

主要发现

净化降低了置信度分数向量的离散度，从而降低两种攻击的有效性。
当对一种攻击进行专门化时，净化器自然提升对另一种攻击的防御。
使用净化，可以将成员推断的准确率降低最多15%。
模型反演误差最多可以提高到原来的4倍。
在净化下，分类准确率损失小于0.4%，置信度分数失真小于5.5%。
预测时间比MemGuard显著更快（在报道的比较中快4,636倍）。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。