QUICK REVIEW

[论文解读] Rethinking White-Box Watermarks on Deep Learning Models under Neural Structural Obfuscation

Yifan Yan, Xudong Pan|arXiv (Cornell University)|Mar 17, 2023

Adversarial Robustness in Machine Learning被引用 8

一句话总结

该论文揭示基于虚拟神经元的神经结构混淆可以在不损害模型效用的情况下使九种主流白盒 DNN 水印无法验证，通过引入非消失的虚拟神经元来干扰水印验证。它还提出生成/注入原语和伪装技术以实现隐蔽混淆。

ABSTRACT

Copyright protection for deep neural networks (DNNs) is an urgent need for AI corporations. To trace illegally distributed model copies, DNN watermarking is an emerging technique for embedding and verifying secret identity messages in the prediction behaviors or the model internals. Sacrificing less functionality and involving more knowledge about the target DNN, the latter branch called extit{white-box DNN watermarking} is believed to be accurate, credible and secure against most known watermark removal attacks, with emerging research efforts in both the academy and the industry. In this paper, we present the first systematic study on how the mainstream white-box DNN watermarks are commonly vulnerable to neural structural obfuscation with extit{dummy neurons}, a group of neurons which can be added to a target model but leave the model behavior invariant. Devising a comprehensive framework to automatically generate and inject dummy neurons with high stealthiness, our novel attack intensively modifies the architecture of the target model to inhibit the success of watermark verification. With extensive evaluation, our work for the first time shows that nine published watermarking schemes require amendments to their verification procedures.

研究动机与目标

突出主流白盒 DNN 水印验证对神经结构混淆的脆弱性。
提出一个基于虚拟神经元的全面攻击框架，包含生成与注入原语。
在保持模型效用的前提下，展示对九个已发表白盒水印方案的攻击有效性。

提出的方法

引入在不改变模型输出的前提下改变水印验证的虚拟神经元。
开发 NeuronClique 与 NeuronSplit 原语，创建具有非消失权重的虚拟神经元组。
从后到前注入虚拟神经元并考虑隐蔽性，利用缩放/打乱不变性。
应用核展开和权重分布技巧对混淆模型进行伪装。
提供一个面向防御的讨论，包括一个虚拟神经元消除方法。

Figure 1: A schematic diagram of NeuronZero on (a) fully-connected layers and (b) convolutional layers.

实验结果

研究问题

RQ1主流白盒水印验证在不损失效用或数据访问的情况下能否被可靠干扰？
RQ2具有非消失权重的虚拟神经元是否能在多种方案中有效使水印验证失效？
RQ3在保持模型功能的同时，虚拟神经元如何自动生成并隐蔽地注入？
RQ4针对这种神经结构混淆与水印消除，存在哪些防御机制？

主要发现

九个已发表的白盒水印方案在攻击后无法成功验证，验证降为随机。
混淆后正常的模型效用保持不变。
攻击框架自动使用 NeuronClique 与 NeuronSplit 原语生成并注入虚拟神经元。
通过缩放、打乱和核展开等技术提升隐蔽性。
本文讨论防御方的知识需求并提供一个虚拟神经元消除算法。

Figure 2: Overview of our proposed watermark removal attack by neural structural obfuscation with dummy neurons.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。