QUICK REVIEW

[论文解读] Deep Neural Network Fingerprinting by Conferrable Adversarial Examples

Nils Lukas, Yuxuan Zhang|arXiv (Cornell University)|Dec 2, 2019

Adversarial Robustness in Machine Learning参考文献 62被引用 23

一句话总结

本文提出了一种基于可授予对抗性样本的深度神经网络分类器指纹识别新方法——可授予对抗性样本（conferrable adversarial examples）是一种可转移的对抗性输入，其在替代模型中仅错误分类为目标类别，而在参考模型中则不会。该方法在微调后的CIFAR-10替代模型上实现了完美的检测性能（ROC AUC = 1.0），优于先前工作（ROC AUC = 0.63），并在模型提取、微调、剪枝和蒸馏攻击下表现出强鲁棒性。

ABSTRACT

In Machine Learning as a Service, a provider trains a deep neural network and gives many users access. The hosted (source) model is susceptible to model stealing attacks, where an adversary derives a surrogate model from API access to the source model. For post hoc detection of such attacks, the provider needs a robust method to determine whether a suspect model is a surrogate of their model. We propose a fingerprinting method for deep neural network classifiers that extracts a set of inputs from the source model so that only surrogates agree with the source model on the classification of such inputs. These inputs are a subclass of transferable adversarial examples which we call conferrable adversarial examples that exclusively transfer with a target label from a source model to its surrogates. We propose a new method to generate these conferrable adversarial examples. We present an extensive study on the irremovability of our fingerprint against fine-tuning, weight pruning, retraining, retraining with different architectures, three model extraction attacks from related work, transfer learning, adversarial training, and two new adaptive attacks. Our fingerprint is robust against distillation, related model extraction attacks, and even transfer learning when the attacker has no access to the model provider's dataset. Our fingerprint is the first method that reaches a ROC AUC of 1.0 in verifying surrogates, compared to a ROC AUC of 0.63 by previous fingerprints.

研究动机与目标

为解决机器学习即服务（MLaaS）中因API访问导致的模型窃取威胁，即攻击者通过API提取替代模型。
开发一种被动且鲁棒的指纹机制，可在模型被提取或修改后仍能事后检测出被盗模型。
识别并利用对抗性样本的一个子类——可授予对抗性样本，其仅在替代模型中转移，而不会在独立训练的参考模型中转移。
评估指纹对多种攻击的鲁棒性，包括对抗性训练和迁移学习，并与先前方法进行性能对比。

提出的方法

提出一类新型目标性、可转移的对抗性样本，称为“可授予对抗性样本”，其仅在替代模型中导致错误分类，而不会在参考模型中导致错误分类。
引入可授予度量（conferrability metric）以量化对抗性样本仅向替代模型转移的程度，而非参考模型。
开发一种集成对抗性攻击（CEM），通过最大化向替代模型的转移同时最小化向参考模型的转移，以优化高可授予度。
将生成的可授予对抗性样本用作持久指纹，以验证可疑模型是否为源模型的替代模型。
基于一组可授予对抗性样本的预测一致性，设计验证机制，以区分替代模型与参考模型。
通过广泛的消融研究评估鲁棒性，涵盖模型提取、微调、剪枝、蒸馏、重训练和迁移学习攻击。

实验结果

研究问题

RQ1能否识别出一类仅在替代模型中转移、而不会在独立训练的参考模型中转移的可转移对抗性样本子类？
RQ2所提出的可授予对抗性样本指纹在多种模型提取和修改攻击下检测替代模型的有效性如何？
RQ3在自适应攻击（如对抗性训练和使用预训练模型及领域数据的迁移学习）下，该指纹存在哪些局限性？
RQ4当攻击者使用知识蒸馏、重训练或微调来规避检测时，该指纹的鲁棒性在多大程度上仍保持？
RQ5与先前工作相比，该指纹在检测性能上（尤其在重训练模型上的ROC AUC）有何定量差异？

主要发现

所提出的指纹在验证重训练的CIFAR-10替代模型时实现了完美的ROC AUC（1.0），显著优于先前工作（ROC AUC = 0.63）。
该指纹对模型提取攻击具有鲁棒性，包括基于知识蒸馏、knockoff网络以及Jagielski和Papernot方法的攻击。
该指纹对微调、权重剪枝以及使用不同架构的重训练均具有鲁棒性，其替代模型与参考模型之间的平均CAEAcc差异约为30%。
从零开始的对抗性训练无法完全移除该指纹，但会使CAEAcc降至ε=0.025时的15%，表明其对这种特定攻击存在脆弱性。
当攻击者缺乏提供方数据集（如CINIC-10）时，该指纹对迁移学习保持鲁棒性；但若攻击者拥有ImageNet32预训练模型和CIFAR-10数据，则指纹可被移除。
置信度分析表明，该指纹在Hitaj等人（2019）提出的检测方法下仍具有非规避性，证实其对已知规避技术具有鲁棒性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。