QUICK REVIEW

[论文解读] BlackMarks: Blackbox Multibit Watermarking for Deep Neural Networks

Huili Chen, Bita Darvish Rouhani|arXiv (Cornell University)|Mar 31, 2019

Advanced Steganography and Watermarking Techniques参考文献 32被引用 41

一句话总结

BlackMarks 引入了首个端到端的黑箱框架，用于对 DNN 进行多比特水印嵌入，通过微调将二进制签名嵌入模型输出，并通过查询密钥提取，开销低且鲁棒性高。

ABSTRACT

Deep Neural Networks have created a paradigm shift in our ability to comprehend raw data in various important fields ranging from computer vision and natural language processing to intelligence warfare and healthcare. While DNNs are increasingly deployed either in a white-box setting where the model internal is publicly known, or a black-box setting where only the model outputs are known, a practical concern is protecting the models against Intellectual Property (IP) infringement. We propose BlackMarks, the first end-to-end multi-bit watermarking framework that is applicable in the black-box scenario. BlackMarks takes the pre-trained unmarked model and the owner's binary signature as inputs and outputs the corresponding marked model with a set of watermark keys. To do so, BlackMarks first designs a model-dependent encoding scheme that maps all possible classes in the task to bit '0' and bit '1' by clustering the output activations into two groups. Given the owner's watermark signature (a binary string), a set of key image and label pairs are designed using targeted adversarial attacks. The watermark (WM) is then embedded in the prediction behavior of the target DNN by fine-tuning the model with generated WM key set. To extract the WM, the remote model is queried by the WM key images and the owner's signature is decoded from the corresponding predictions according to the designed encoding scheme. We perform a comprehensive evaluation of BlackMarks's performance on MNIST, CIFAR10, ImageNet datasets and corroborate its effectiveness and robustness. BlackMarks preserves the functionality of the original DNN and incurs negligible WM embedding runtime overhead as low as 2.054%.

研究动机与目标

在黑箱设置（MLaaS）下，为 DNN 的知识产权保护提供动力。
开发一种可扩展的多比特水印框架，在无需模型内部信息的情况下运行。
设计一种基于模型的类别输出编码为比特的方式，并通过定向对抗攻击生成水印密钥。
在保持准确性的同时，对带水印的模型进行微调以嵌入水印。
提供鲁棒的提取与验证流程，假阳性/假阴性率低。

提出的方法

通过对类别均值（softmax 之前）进行 K-means 聚类，将类别输出分成两个比特簇，来对 DNN 输出激活进行所有者签名编码。
使用与编码方案对齐的定向对抗攻击生成水印密钥图像和标签。
用一个正则化损失（结合标准交叉熵损失和水印特定损失）对预训练模型进行微调以嵌入签名。
通过使用水印密钥进行查询并应用编码方案，从模型预测中解码所有者签名，得到比特误码率（BER）。
通过在密钥选择阶段选取更大的初始密钥集合并在未标记模型变体之间进行交叉比对，防止水印密钥的可迁移性。
提供效率分析：一次性嵌入开销低至 2.054%，以及黑箱提取成本。

实验结果

研究问题

RQ1黑箱 DNN 水印是否可以通过多比特容量增强知识产权保护？
RQ2如何设计一个基于模型的输出编码，将输出映射到比特，以在不损害准确性的前提下支持多比特水印？
RQ3在黑箱设置下，这类水印对微调、剪枝和覆写的鲁棒性如何？
RQ4在黑箱场景中，如何量化并确保水印的完整性与可信度？

主要发现

BlackMarks 在使用指定密钥集时，在 MNIST、CIFAR-10 和 ImageNet 上嵌入后实现高水印检测，BER 为零。
该框架在 MNIST、CIFAR-10 和 ImageNet 分别容忍高达 95%、80%、90% 的参数剪枝，而不降低检测率（但过度剪枝会损害准确性）。
在模型微调后（水印在实验中最长达 100 轮），水印仍可检测，所有基准测试的 BER 均为零。
用新水印覆盖原水印不会阻止恢复原始水印（BER 仍为零）。
嵌入带来较低的运行时开销（低至 2.054%），且水印方法似乎也提高了对抗扰动鲁棒性；作者指出在几种攻击下还带来精度提升的副作用。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。