QUICK REVIEW

[论文解读] Piracy Resistant Watermarks for Deep Neural Networks

Huiying Li, Emily Wenger|arXiv (Cornell University)|Oct 2, 2019

Adversarial Robustness in Machine Learning参考文献 43被引用 31

一句话总结

本文提出了一种名为零嵌入（null embedding）的新方法，通过在初始训练期间强制建立正常分类准确率与水印之间的强依赖关系，将抗盗版水印嵌入深度神经网络。与依赖增量训练的先前方法不同，零嵌入可防止攻击者在不破坏模型性能的前提下移除或添加新水印，从而在多种模型和任务中实现强大的抗盗版能力。

ABSTRACT

As companies continue to invest heavily in larger, more accurate and more robust deep learning models, they are exploring approaches to monetize their models while protecting their intellectual property. Model licensing is promising, but requires a robust tool for owners to claim ownership of models, i.e. a watermark. Unfortunately, current designs have not been able to address piracy attacks, where third parties falsely claim model ownership by embedding their own "pirate watermarks" into an already-watermarked model. We observe that resistance to piracy attacks is fundamentally at odds with the current use of incremental training to embed watermarks into models. In this work, we propose null embedding, a new way to build piracy-resistant watermarks into DNNs that can only take place at a model's initial training. A null embedding takes a bit string (watermark value) as input, and builds strong dependencies between the model's normal classification accuracy and the watermark. As a result, attackers cannot remove an embedded watermark via tuning or incremental training, and cannot add new pirate watermarks to already watermarked models. We empirically show that our proposed watermarks achieve piracy resistance and other watermark properties, over a wide range of tasks and models. Finally, we explore a number of adaptive counter-measures, and show our watermark remains robust against a variety of model modifications, including model fine-tuning, compression, and existing methods to detect/remove backdoors. Our watermarked models are also amenable to transfer learning without losing their watermark properties.

研究动机与目标

解决现有深度神经网络水印方案中抗盗版能力严重不足的问题，即攻击者可通过增量训练嵌入伪造水印。
克服当前水印方法的根本性漏洞：依赖增量训练，导致攻击者可覆盖或添加水印。
设计一种不可伪造、持久且可验证的水印系统，即使在模型微调、压缩或迁移学习等操作下依然有效。
确保对已水印模型尝试嵌入新水印的行为将导致分类准确率灾难性下降，从而阻止恶意使用。

提出的方法

提出零嵌入：一种在初始模型训练期间将水印比特串作为约束嵌入的技术，从而在水印与模型正常分类行为之间建立强依赖关系。
利用公钥密码学和可验证签名，将水印比特串安全绑定至模型所有者，实现身份认证与验证。
通过修改训练过程中的优化方式，使模型权重在满足高分类准确率的同时，也符合水印一致性要求。
通过使任何试图通过增量训练修改或添加水印的操作与原始水印约束冲突，从而破坏模型准确率，防止此类操作。
利用初始训练阶段的不可逆性，确保水印无法在不从头开始重新训练的情况下被移除或替换。
设计水印使其在迁移学习、模型压缩及其他常见模型修改操作中保持完整，持续保障所有权证明。

实验结果

研究问题

RQ1能否设计一种深度神经网络水印方案，使其真正抵抗盗版攻击，即攻击者可将伪造水印嵌入已水印模型？
RQ2为何现有水印技术易受盗版攻击？其架构或训练特性中存在何种使该漏洞成为可能的机制？
RQ3能否以某种方式嵌入水印，从而防止通过增量训练或微调实现水印的移除或替换？
RQ4所提出的水印在各种模型修改操作下（包括压缩、迁移学习及后门检测/移除技术）的鲁棒性如何？
RQ5当模型通过分布外数据（如 ImageNet、YouTube Faces）被提取或重训练时，水印是否仍能保持并被验证？

主要发现

现有水印方法，包括基于正则化和基于人工痕迹的方法，均易受盗版攻击，因为攻击者可通过增量训练成功嵌入新水印。
零嵌入可有效防止任何增量训练对水印的修改或添加，因为此类操作会使模型的正常分类准确率下降至可接受阈值以下。
针对零嵌入模型的盗版攻击导致的性能损失与从零开始训练相当，使此类攻击在计算和实践上均不可行。
水印在模型微调、压缩以及现有后门检测/移除技术下仍保持鲁棒，持续保障所有权证明。
水印化模型在迁移学习中保持兼容性，水印完整性在不同下游任务中得以维持。
使用分布外数据（如 ImageNet、YouTube Faces）进行模型提取攻击所需的数据量和计算成本，显著高于从头训练，且仅在数据量达到原始数据集的255%时才可能实现准确率提升。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。