QUICK REVIEW

[论文解读] Effectiveness of Distillation Attack and Countermeasure on Neural Network Watermarking

Ziqi Yang, Hung Dang|arXiv (Cornell University)|Jun 14, 2019

Adversarial Robustness in Machine Learning参考文献 47被引用 24

一句话总结

本文表明，知识蒸馏会严重削弱现有的神经网络水印技术，原因是其移除了与主分类任务解耦的水印相关参数。为应对这一问题，作者提出 *ingrain*，一种鲁棒的水印方法，通过正则化损失将水印直接嵌入模型的主要预测中，实现了对知识蒸馏的强大抵抗能力，同时保持了高准确率和对其他常见转换的鲁棒性。

ABSTRACT

The rise of machine learning as a service and model sharing platforms has raised the need of traitor-tracing the models and proof of authorship. Watermarking technique is the main component of existing methods for protecting copyright of models. In this paper, we show that distillation, a widely used transformation technique, is a quite effective attack to remove watermark embedded by existing algorithms. The fragility is due to the fact that distillation does not retain the watermark embedded in the model that is redundant and independent to the main learning task. We design ingrain in response to the destructive distillation. It regularizes a neural network with an ingrainer model, which contains the watermark, and forces the model to also represent the knowledge of the ingrainer. Our extensive evaluations show that ingrain is more robust to distillation attack and its robustness against other widely used transformation techniques is comparable to existing methods.

研究动机与目标

调查现有神经网络水印技术对模型转换（尤其是知识蒸馏）的脆弱性。
识别当前水印方法在知识蒸馏下失效的原因，即水印相关参数与主分类任务解耦。
设计一种对知识蒸馏及其他常见模型转换具有鲁棒性的新型水印方法。
确保水印在保持主任务准确率和性能的前提下依然完整保留。

提出的方法

在包含水印载体的数据集上训练一个独立的“ingrainer”模型，该模型编码秘密水印并生成正确输出。
将ingrainer模型的损失函数作为正则化项，应用于主分类模型的训练过程中。
联合优化主模型，使其在相同训练数据上同时匹配真实标签和ingrainer的输出。
调节正则化权重，以平衡水印鲁棒性与模型准确率。
将水印嵌入与主分类任务使用相同神经路径，降低其与主功能的独立性。
确保即使在知识蒸馏后，也能从主模型在良性数据上的预测中恢复水印。

实验结果

研究问题

RQ1知识蒸馏能否有效移除现有神经网络水印技术所嵌入的水印？
RQ2为何当前水印方法在知识蒸馏下失效，即使模型准确率保持不变？
RQ3如何使水印技术对知识蒸馏具有鲁棒性，同时保持主任务性能？
RQ4能否将水印集成到主模型的预测过程中，以增强对模型转换的抵抗能力？

主要发现

由于水印相关参数与主分类功能解耦，知识蒸馏能有效移除现有水印技术嵌入的水印，即使准确率损失可忽略不计。
现有水印技术将水印嵌入冗余且独立的模型组件中，这些组件在蒸馏过程中被丢弃，导致水印被完全擦除。
*Ingrain* 通过将水印嵌入与主分类任务相同的模型路径中，成功抵抗了知识蒸馏攻击，使水印在知识蒸馏过程中得以保留。
所提方法在剪枝和量化等其他常见转换下，鲁棒性与现有方法相当。
通过将ingrainer的损失作为正则化项，*ingrain* 实现了分类与水印目标的联合训练，增强了模型的鲁棒性。
该方法支持在水印鲁棒性与模型准确率之间进行可调的权衡，支持实际部署。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。