QUICK REVIEW

[论文解读] High Accuracy and High Fidelity Extraction of Neural Networks

Matthew Jagielski, Nicholas Carlini|arXiv (Cornell University)|Sep 3, 2019

Adversarial Robustness in Machine Learning参考文献 57被引用 55

一句话总结

论文将模型提取的范畴聚焦于准确性与保真度，显示基于学习的攻击提高了准确性提取的效率，并首次提出实用的功能等效提取攻击用于直接权重提取，证明在大型生产级模型上的可行性。

ABSTRACT

In a model extraction attack, an adversary steals a copy of a remotely deployed machine learning model, given oracle prediction access. We taxonomize model extraction attacks around two objectives: *accuracy*, i.e., performing well on the underlying learning task, and *fidelity*, i.e., matching the predictions of the remote victim classifier on any input. To extract a high-accuracy model, we develop a learning-based attack exploiting the victim to supervise the training of an extracted model. Through analytical and empirical arguments, we then explain the inherent limitations that prevent any learning-based strategy from extracting a truly high-fidelity model---i.e., extracting a functionally-equivalent model whose predictions are identical to those of the victim model on all possible inputs. Addressing these limitations, we expand on prior work to develop the first practical functionally-equivalent extraction attack for direct extraction (i.e., without training) of a model's weights. We perform experiments both on academic datasets and a state-of-the-art image classifier trained with 1 billion proprietary images. In addition to broadening the scope of model extraction research, our work demonstrates the practicality of model extraction attacks against production-grade systems.

研究动机与目标

激励并在模型提取中定义两个对抗性目标：准确性与保真度。
将现有提取攻击系统化到一个二维目标空间中。
展示学习型提取在实现高保真度方面的局限性。
开发用于直接权重恢复的实用功能等效提取。
展示在学术数据集和最先进的生产分类器上的攻击。

提出的方法

攻击目标与能力的分类（准确性、保真度、功能等效提取）。
开发利用受害模型作为标注Oracle来最大化任务准确性的学习型提取。
证明学习型策略在实现高保真度方面的固有局限性。
提出一个实用的功能等效提取攻击，能够仅通过输入输出访问恢复一个两层网络的权重。
在ImageNet尺度的模型（WSL）以及标准数据集（SVHN、CIFAR-10）上评估攻击。
探索半监督与混合方法（旋转损失、MixMatch）以提升查询效率。

实验结果

研究问题

RQ1在现实的查询访问约束下，模型提取能达到功能等效的保真度吗？
RQ2基于学习的提取在查询效率和可扩展性方面与以保真度为中心的提取相比如何？
RQ3面向高保真度的学习型提取的基本极限是什么，是否可以在没有数据侧通道的情况下实现直接权重恢复？
RQ4未标注数据和半监督技术如何影响大型模型上提取攻击的可行性？
RQ5经过大规模专有数据训练的生产级模型在黑盒访问下仍对实际提取攻击易受攻击吗？

主要发现

基于学习的提取提高了准确性提取，相比先前的方法在查询效率上更高，参数规模达到数百万。
未标注数据与半监督技术（旋转损失、MixMatch）在更少查询次数的情况下显著提升提取性能。
功能等效提取攻击在仅通过输入输出访问即可直接恢复两层网络权重方面是实用的。
基于学习的提取方法存在固有的保真度局限性，实验表明在受控去确定性下保真度上限约为93%左右。
基于MixMatch的提取在250次查询下，几乎可以在SVHN和CIFAR-10上达到接近Oracle的准确性，所需带标签查询显著更少。
该工作展示了对生产等级系统进行模型提取的可行性，并提供了提取难度的理论界限。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。