QUICK REVIEW

[论文解读] The Secret Revealer: Generative Model-Inversion Attacks Against Deep Neural Networks

Yuheng Zhang, Ruoxi Jia|arXiv (Cornell University)|Nov 17, 2019

Adversarial Robustness in Machine Learning参考文献 25被引用 34

一句话总结

本文提出生成模型反演（GMI）攻击，利用GAN先验从DNNs重构私人训练数据，相较于先前方法有显著性能提升，并揭示标准差分隐私保护的有限性。

ABSTRACT

This paper studies model-inversion attacks, in which the access to a model is abused to infer information about the training data. Since its first introduction, such attacks have raised serious concerns given that training data usually contain privacy-sensitive information. Thus far, successful model-inversion attacks have only been demonstrated on simple models, such as linear regression and logistic regression. Previous attempts to invert neural networks, even the ones with simple architectures, have failed to produce convincing results. We present a novel attack method, termed the generative model-inversion attack, which can invert deep neural networks with high success rates. Rather than reconstructing private training data from scratch, we leverage partial public information, which can be very generic, to learn a distributional prior via generative adversarial networks (GANs) and use it to guide the inversion process. Moreover, we theoretically prove that a model's predictive power and its vulnerability to inversion attacks are indeed two sides of the same coin---highly predictive models are able to establish a strong correlation between features and labels, which coincides exactly with what an adversary exploits to mount the attacks. Our extensive experiments demonstrate that the proposed attack improves identification accuracy over the existing work by about 75\% for reconstructing face images from a state-of-the-art face recognition classifier. We also show that differential privacy, in its canonical form, is of little avail to defend against our attacks.

研究动机与目标

动机：揭示通过模型反演攻击泄露训练数据的隐私风险于深度网络。
提出一种生成式MI（GMI）攻击，利用公开数据通过GAN学习数据流形。
在理论上将模型的预测能力与其对反演攻击的易感性联系起来。
在多个任务上经验性地证明相较现有MI攻击有显著改进。
评估对所提GMI攻击的差分隐私防护效果。

提出的方法

在公开数据上训练一个GAN（具有多样性损失的Wasserstein GAN），以学习一个真实的数据流形。
两阶段攻击：公共知识蒸馏（在公开数据上训练生成器和判别器）和秘密揭示（在目标模型下优化潜在变量z以最大化似然）。
在第二阶段，使用先验损失L_prior(z)和身份损失L_id(z)来优化z，以恢复敏感特征。
以L_wgan确保真实感，并加入多样性项L_div，确保通过目标网络特征空间的投影具有信息性。
提供明确的损失公式：L_wgan(G,D)=E_x[D(x)]−E_z[D(G(z))], L_div(G)=E_{z1,z2}[||F(G(z1))-F(G(z2))|| / ||z1−z2||], L_prior(z)=−D(G(z)), L_id(z)=−log C(G(z)).

实验结果

研究问题

RQ1从公开数据学习的生成先验是否能在深度网络上实现有效的模型反演？
RQ2模型的预测能力与对反演攻击的易感性之间有何关系？
RQ3像差分隐私这样的标准隐私防护是否能减弱GMI攻击的有效性？
RQ4哪些因素（公开数据的相似性、辅助知识）会影响GMI攻击的成功？
RQ5GMI攻击在不同架构和数据集上的表现如何？

主要发现

GMI在很大程度上优于此前的EMI攻击，例如在CelebA人脸识别模型上的Top-5攻击准确率最高可提高75%。
在典型的DP设置下，差分隐私对GMI几乎没有防御效果。
通过GAN进行的公开知识蒸馏提供了有效的先验；当公开/私有数据分布不一致时攻击性能下降。
模型预测能力越高，与跨架构的MI易受攻击性呈正相关。
GMI在CelebA上对VGG16、ResNet-152和face.evolve模型一贯超过EMI和PII；GMI可实现更真实且保留身份特征的重构。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。