QUICK REVIEW

[论文解读] Stealing Machine Learning Models via Prediction APIs

Florian Tramèr, Fan Zhang|arXiv (Cornell University)|Sep 9, 2016

Adversarial Robustness in Machine Learning被引用 733

一句话总结

本论文展示了通过预测API提供 ML 模型的实用模型提取攻击，使用输出包含置信度值和不完整输入，几乎完美地恢复目标模型（包括逻辑回归、神经网络和决策树），并讨论对策。

ABSTRACT

Machine learning (ML) models may be deemed confidential due to their sensitive training data, commercial value, or use in security applications. Increasingly often, confidential ML models are being deployed with publicly accessible query interfaces. ML-as-a-service ("predictive analytics") systems are an example: Some allow users to train models on potentially sensitive data and charge others for access on a pay-per-query basis. The tension between model confidentiality and public access motivates our investigation of model extraction attacks. In such attacks, an adversary with black-box access, but no prior knowledge of an ML model's parameters or training data, aims to duplicate the functionality of (i.e., "steal") the model. Unlike in classical learning theory settings, ML-as-a-service offerings may accept partial feature vectors as inputs and include confidence values with predictions. Given these practices, we show simple, efficient attacks that extract target ML models with near-perfect fidelity for popular model classes including logistic regression, neural networks, and decision trees. We demonstrate these attacks against the online services of BigML and Amazon Machine Learning. We further show that the natural countermeasure of omitting confidence values from model outputs still admits potentially harmful model extraction attacks. Our results highlight the need for careful ML model deployment and new model extraction countermeasures.

研究动机与目标

在 MLaaS 环境中，动机并形式化通过预测 API 暴露的机密 ML 模型的风险。
展示跨越常见模型类别（逻辑回归、神经网络、决策树）的实用提取攻击。
在真实服务中量化攻击效率，并识别潜在的对策，例如仅输出类别标签。
突出模型提取对训练数据的隐私与安全影响及规避性。

提出的方法

使用测试误差和统一误差度量定义一个黑箱模型提取框架，以量化提取模型与目标模型的匹配程度。
展示利用带有置信值的输出和非自适应的批量查询来恢复逻辑模型参数的方程求解攻击。
开发将置信值作为标识符以重建决策树的路径搜索攻击。
使用真实预测 API 和公开数据集对 ML 服务（Amazon 和 BigML）进行攻击评估。
将攻击扩展到多类逻辑回归、神经网络和核逻辑回归，以说明数据泄露和模型重构能力。

实验结果

研究问题

RQ1在仅有对返回预测和置信分数的 ML 预测 API 的黑盒访问的情况下，攻击者能否恢复等效或精确的模型？
RQ2置信值和不完整查询是否在常见模型类别（LR、SVM、神经网络、决策树）中实现有效的模型提取？
RQ3针对当前的 MLaaS 提供商如 Amazon 和 BigML 的模型提取的实际影响与局限性是什么？
RQ4哪些对策（例如仅输出类别标签）仍然脆弱，需要哪些额外保护？
RQ5模型提取是否会泄露训练数据的信息，在何种情形（例如核逻辑回归）下这种泄露会变得明显？

主要发现

服务	模型类型	数据集	查询数	时间（秒）
Amazon	Logistic Regression	Digits	650	70
Amazon	Logistic Regression	Adult	1,485	149
BigML	Decision Tree	German Credit	1,150	631
BigML	Decision Tree	Steak Survey	4,013	2,088

方程求解攻击可以在非自适应、批量查询下恢复二分类和多分类逻辑回归以及神经网络的参数。
对于多类 LR 和 MLP，攻击所需的查询数大致等于未知参数（k），实现近乎完美的提取（R_test 和 R_unif 接近0）。
通过将置信值作为准标识符来发现决策路径，可以提取决策树，这使某些目标实现了实际的精确学习。
在实验中，表格结果显示对服务的快速提取：例如 Digits 的 Amazon Logistic Regression，650 次查询，70 秒；Adult 的 Amazon Logistic Regression，1,485 次查询，149 秒；German Credit 的 BigML Decision Tree，1,150 次查询，631 秒；Steak Survey 的 BigML Decision Tree，4,013 次查询，2,088 秒。
即使省略置信输出，适应性攻击仍可在各种模型上实现超过 99% 的输入空间准确度，尽管在某些情况下需要更多查询（最多约 100 倍）。
核逻辑回归可以通过恢复的表示向量泄露训练数据，并显示类似传感器的训练数据泄露特性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。