[论文解读] How to Prove Your Model Belongs to You: A Blind-Watermark based Framework to Protect Intellectual Property of DNN
本论文提出一种盲水印 IPP 框架用于 DNNs,将不可区分的密钥样本嵌入模型以证明所有权、防御规避并抵抗欺诈性主张,在多个数据集与模型架构上具有强劲的实证结果。
Deep learning techniques have made tremendous progress in a variety of challenging tasks, such as image recognition and machine translation, during the past decade. Training deep neural networks is computationally expensive and requires both human and intellectual resources. Therefore, it is necessary to protect the intellectual property of the model and externally verify the ownership of the model. However, previous studies either fail to defend against the evasion attack or have not explicitly dealt with fraudulent claims of ownership by adversaries. Furthermore, they can not establish a clear association between the model and the creator's identity. To fill these gaps, in this paper, we propose a novel intellectual property protection (IPP) framework based on blind-watermark for watermarking deep neural networks that meet the requirements of security and feasibility. Our framework accepts ordinary samples and the exclusive logo as inputs, outputting newly generated samples as watermarks, which are almost indistinguishable from the origin, and infuses these watermarks into DNN models by assigning specific labels, leaving the backdoor as the basis for our copyright claim. We evaluated our IPP framework on two benchmark datasets and 15 popular deep learning models. The results show that our framework successfully verifies the ownership of all the models without a noticeable impact on their primary task. Most importantly, we are the first to successfully design and implement a blind-watermark based framework, which can achieve state-of-art performances on undetectability against evasion attack and unforgeability against fraudulent claims of ownership. Further, our framework shows remarkable robustness and establishes a clear association between the model and the author's identity.
研究动机与目标
- 激发/阐明保护 DNN 知识产权的需求,并解决先前水印方法的局限性。
- 提出基于盲水印的 IPP 框架,将模型与创建者身份联系起来。
- 通过原型和对多种数据集与架构的实证评估,展示可行性和实用性。
- 评估对规避攻击和欺诈性所有权主张的鲁棒性。
- 表明水印对主任务模型性能的影响很小,同时实现可靠的所有权验证。
提出的方法
- 通过生成密钥样本 x^key = G(e, x, l) 来嵌入水印,其中 e 是轻量级编码器,l 是独有标志;用带判别器的自编码器样式设置进行训练,以使 key-sample 分布 P_e 与数据分布 P_data 对齐。
- 使用对抗/判别器目标最小化 P_data 与 P_e 之间的 KL 散度,结合基于 SSIM 的重构损失以保持样本不可区分性。
- 通过后门主 DNN,使 x^key 映射到预定义标签 t^key,从而通过密钥样本的高准确度进行所有权验证。
- 提供一个验证程序,所有者用密钥样本查询远程模型,并检查 acc_g(x^key, t^key) 是否超过阈值。
- 详细给出联合目标 O_e,结合重构保真度、SSIM 与对抗损失,指导编码器、判别器和主机模型在训练过程中的协同。
- 概述包含编码器、判别器和 host DNN 的总体流水线,并给出训练协议与超参数设定。
实验结果
研究问题
- RQ1盲水印 IPP 框架在保持对原始任务的保真度的同时,是否能够可靠地证明 DNN 的所有权?
- RQ2与先前水印方法相比,所提出的框架是否对规避攻击和欺诈性所有权主张有更强的鲁棒性?
- RQ3水印分布是否接近原始数据分布,以确保不可感知性和鲁棒性?
- RQ4在实际场景中,框架是否能建立模型与创建者身份之间的明确关联?
- RQ5水印嵌入对多种架构和数据集上的模型准确率影响是多少?
主要发现
- 水印化模型的准确率与未水印模型相近,保真度下降平均为 0.66%,最低为 0.14%。
- 关键样本实现高验证准确率,水印模型在关键样本上的准确率超过 90%,有时达到 100%。
- 盲水印方法在对规避攻击方面实现不可检测性,检测器的表现不优于随机(扩展测试中的 AUC 约 0.5 到 0.65)。
- 该框架在合理假设下使伪造有效密钥样本变得困难,从而对抗欺诈性所有权主张。
- 在 MNIST 和 CIFAR-10 上对 15 个宿主 DNN 的实验显示在对主任务影响有限的情况下成功完成所有权验证。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。