QUICK REVIEW

[论文解读] Dataset Inference: Ownership Resolution in Machine Learning

Pratyush Maini, Mohammad Yaghini|arXiv (Cornell University)|Apr 21, 2021

Adversarial Robustness in Machine Learning参考文献 38被引用 23

一句话总结

本文提出数据集推理（DI），一种新颖的所有权确认防御机制，通过检测可疑模型是否包含受害者私有训练数据中的知识来识别模型盗用行为。通过测量在受害者训练数据小样本集上的预测确定性（距离决策边界的裕度），DI 即使仅暴露 50 个样本，也能以超过 99% 的置信度检测到盗用行为，且无需重新训练或影响模型准确性。

ABSTRACT

With increasingly more data and computation involved in their training, machine learning models constitute valuable intellectual property. This has spurred interest in model stealing, which is made more practical by advances in learning with partial, little, or no supervision. Existing defenses focus on inserting unique watermarks in a model's decision surface, but this is insufficient: the watermarks are not sampled from the training distribution and thus are not always preserved during model stealing. In this paper, we make the key observation that knowledge contained in the stolen model's training set is what is common to all stolen copies. The adversary's goal, irrespective of the attack employed, is always to extract this knowledge or its by-products. This gives the original model's owner a strong advantage over the adversary: model owners have access to the original training data. We thus introduce $dataset$ $inference$, the process of identifying whether a suspected model copy has private knowledge from the original model's dataset, as a defense against model stealing. We develop an approach for dataset inference that combines statistical testing with the ability to estimate the distance of multiple data points to the decision boundary. Our experiments on CIFAR10, SVHN, CIFAR100 and ImageNet show that model owners can claim with confidence greater than 99% that their model (or dataset as a matter of fact) was stolen, despite only exposing 50 of the stolen model's training points. Dataset inference defends against state-of-the-art attacks even when the adversary is adaptive. Unlike prior work, it does not require retraining or overfitting the defended model.

研究动机与目标

解决机器学习模型盗用场景下的所有权证明挑战，尤其是在蒸馏或提取攻击导致传统水印技术失效的情况下。
克服现有水印防御机制的局限性，这些机制需要重新训练模型并导致性能下降。
利用所有被盗模型本质上都包含来自受害者训练数据知识的事实，无论攻击向量如何。
开发一种利用信息不对称的防御机制——受害者可访问其原始训练数据，而攻击者无法访问。
在不修改模型或重新训练的前提下，实现可靠且高置信度的所有权声明。

提出的方法

测量可疑模型在受害者训练数据小样本私有子集上的预测确定性（距离决策边界的裕度）。
在受害者模型的训练集和验证集嵌入表示上训练置信度回归器，以估计裕度分布。
使用统计假设检验（如 t 检验或置换检验）比较可疑模型在受害者训练数据与随机测试数据上的平均裕度。
设定 p 值阈值（例如 10⁻³），以判断可疑模型在受害者训练数据上是否表现出显著更高的置信度，从而指示知识泄露。
在多种架构（如 Wide ResNet-50-2、AlexNet、Inception V3）和数据集（CIFAR-10、SVHN、ImageNet）上应用该方法，以评估其泛化能力。
在自适应攻击和受害者与攻击者数据集之间不同数据重叠率（λ）下评估其鲁棒性。

实验结果

研究问题

RQ1模型所有者能否可靠地检测出可疑模型是否源自其私有训练数据，即使该模型是通过查询式提取或完整数据窃取获得的？
RQ2在仅使用受害者数据集中的多少训练样本时，可通过预测确定性的统计推断实现对所有权的高置信度声明？
RQ3当攻击者使用蒸馏、微调或无数据知识迁移时，数据集推理是否仍有效？
RQ4在 ImageNet 等大规模基准上，数据集推理的性能如何，尤其是在过拟合可能性较低的情况下？
RQ5数据集推理成功检测知识泄露所需的受害者训练数据在攻击者数据集中最小重叠率（λ）是多少？

主要发现

在仅使用受害者训练集中的 10 个样本进行测试时，DI 在 ImageNet 上仍能实现 p 值 < 10⁻³，表明所有权检测具有极高置信度。
在 CIFAR-10 和 SVHN 上，DI 仅需受害者模型暴露 50 个训练样本，即可实现超过 99% 的置信度来检测模型盗用。
即使攻击者使用无数据蒸馏或微调，DI 仍能成功检测到知识泄露，表明其对自适应攻击具有鲁棒性。
该方法在不同架构（如 Wide ResNet-50-2、AlexNet、Inception V3）上均表现良好，显示出对复杂模型的可扩展性。
当受害者训练数据在攻击者数据集中的重叠率为 10%（λ = 0.1）时，DI 仍能以 p 值 < 10⁻⁴ 检测到盗用，表明对极小程度数据泄露的敏感性。
检验的效应量随数据重叠率（λ）的提高而增大，证实随着共享训练数据的增加，DI 的置信度也随之提升。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。