QUICK REVIEW

[论文解读] DeepSigns: A Generic Watermarking Framework for IP Protection of Deep Learning Models

Bita Darvish Rouhani, Huili Chen|arXiv (Cornell University)|Apr 2, 2018

Adversarial Robustness in Machine Learning被引用 84

一句话总结

DeepSigns 通过修改各层激活分布以及在输出层进行后处理步骤，将稳健的数字水印嵌入深度学习模型，从而在白盒和黑盒设置下实现知识产权所有权证明，同时在保持模型精度和抵抗常见攻击方面具有效力。

ABSTRACT

Deep Learning (DL) models have caused a paradigm shift in our ability to comprehend raw data in various important fields, ranging from intelligence warfare and healthcare to autonomous transportation and automated manufacturing. A practical concern, in the rush to adopt DL models as a service, is protecting the models against Intellectual Property (IP) infringement. The DL models are commonly built by allocating significant computational resources that process vast amounts of proprietary training data. The resulting models are therefore considered to be the IP of the model builder and need to be protected to preserve the owner's competitive advantage. This paper proposes DeepSigns, a novel end-to-end IP protection framework that enables insertion of coherent digital watermarks in contemporary DL models. DeepSigns, for the first time, introduces a generic watermarking methodology that can be used for protecting DL owner's IP rights in both white-box and black-box settings, where the adversary may or may not have the knowledge of the model internals. The suggested methodology is based on embedding the owner's signature (watermark) in the probability density function (pdf) of the data abstraction obtained in different layers of a DL model. DeepSigns can demonstrably withstand various removal and transformation attacks, including model compression, model fine-tuning, and watermark overwriting. Proof-of-concept evaluations on MNIST, and CIFAR10 datasets, as well as a wide variety of neural network architectures including Wide Residual Networks, Convolution Neural Networks, and Multi-Layer Perceptrons corroborate DeepSigns' effectiveness and applicability.

研究动机与目标

在深度学习模型日益以服务形式部署的背景下，推动对其知识产权的保护。
提出一种在白盒和黑盒设置均可工作的通用水印框架。
在隐藏层的激活分布以及训练后在输出层嵌入水印，同时不损害准确性。
展示对模型压缩、微调和水印覆盖的鲁棒性。
提供实用的评估指标和 API，以促进跨体系结构的采用。

提出的方法

将 N 位水印字符串嵌入分层激活分布的高斯分量均值中（高斯混合模型先验）。
通过向训练损失添加项（loss1）来使激活均值与所选高斯中心对齐；再加入第二项（loss2），通过随机梯度下降优化将激活特征推向水印比特的二值化投影。
使用随机投影矩阵 A，先经过 sigmoid 再进行硬阈值化，将选定的高斯中心映射到水印比特（b）。
在训练过程中联合优化 loss0（分类）、loss1（GMM 对齐）和 loss2（水印比特对齐），以嵌入水印且不牺牲准确性。
将输出层作为后处理水印步骤，利用类别条件分布的尾部区域，生成 K 个输入密钥，并用这些密钥进行微调，以强制对关键样本进行正确标记。

实验结果

研究问题

RQ1一个通用的水印框架是否能够在白盒和黑盒部署中保护 DL 模型的所有权？
RQ2是否有可能在不降低跨体系结构（MLP、CNN、ResNet、WideResNet）基模型准确性的前提下嵌入鲁棒水印？
RQ3水印对裁剪、微调和覆盖等常见 DL 模型变换的鲁棒性如何？
RQ4水印提取在不同设置下是否能以低假阳性率和合理的检测阈值可靠地验证所有权？
RQ5在现实世界的 DL 实践中，需要哪些实用的评估指标和 API 支持来促进水印技术的采用？

主要发现

DeepSigns 可以通过将二进制信息嵌入中间激活的 pdf 和输出层来嵌入水印，而在评估模型中不降低预测准确性。
该框架在对 MNIST、CIFAR-10 以及多种体系结构（MLP、CNN、WideResNet）的大量实验中展示了对裁剪、微调和水印覆盖的鲁棒性。
它提供双水印方法：在隐藏层进行功能性水印（水印中心和投影）以及通过后训练数据密钥触发的输出层水印。
该方法具有很高的检测能力，并通过精心选择的密钥和阈值来控制假阳性，适用于白盒和黑盒场景。
提出了一个 API 和一组评估指标，以促进采用并便于与未来的 DL 水印方法进行比较。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。