QUICK REVIEW

[论文解读] Untargeted Backdoor Watermark: Towards Harmless and Stealthy Dataset Copyright Protection

Yiming Li, Yang Bai|arXiv (Cornell University)|Sep 27, 2022

Adversarial Robustness in Machine Learning被引用 26

一句话总结

论文提出用于开源数据集保护的无定向后门水印（UBW），提供无害且隐蔽的所有权验证，UBW-P（中毒标签）和UBW-C（干净标签）以及基于假设检验的验证方法。

ABSTRACT

Deep neural networks (DNNs) have demonstrated their superiority in practice. Arguably, the rapid development of DNNs is largely benefited from high-quality (open-sourced) datasets, based on which researchers and developers can easily evaluate and improve their learning methods. Since the data collection is usually time-consuming or even expensive, how to protect their copyrights is of great significance and worth further exploration. In this paper, we revisit dataset ownership verification. We find that existing verification methods introduced new security risks in DNNs trained on the protected dataset, due to the targeted nature of poison-only backdoor watermarks. To alleviate this problem, in this work, we explore the untargeted backdoor watermarking scheme, where the abnormal model behaviors are not deterministic. Specifically, we introduce two dispersibilities and prove their correlation, based on which we design the untargeted backdoor watermark under both poisoned-label and clean-label settings. We also discuss how to use the proposed untargeted backdoor watermark for dataset ownership verification. Experiments on benchmark datasets verify the effectiveness of our methods and their resistance to existing backdoor defenses. Our codes are available at \url{https://github.com/THUYimingLi/Untargeted_Backdoor_Watermark}.

研究动机与目标

重新审视数据集所有权验证并识别定向后门水印的安全风险。
引入无定向后门水印，以实现无害、非确定性的模型行为。
采用双层优化开发 UBW-P（中毒标签）和 UBW-C（干净标签）方案。
提出基于假设检验、利用 UBW 相关信号的数据集所有权验证方法。
在基准数据集上经验验证 UBW 的有效性及对后门防御的抗性。

提出的方法

定义平均预测离散度 D_p，用以衡量具有相同真实标签的样本预测的分散程度。
通过随机重新标记被污染的样本并在修改后的数据集上训练来引入 UBW-P。
通过双层优化来开发 UBW-C，通过可微目标和被污染子集最大化代理离散度，同时保持标签不变。
提供两种可微分的离散度代理（D_s 和 D_c），以实现对 UBW-C 的可控优化。
提出基于假设检验的数据集所有权验证，使用良性输入与被污染输入之间的成对测试（H0: P_b = P_p + tau）。
证明 UBW 对防御手段的鲁棒性，并展示其对微调和剪枝的抗性。

实验结果

研究问题

RQ1无定向后门水印是否能在用水印数据训练的网络中诱发无确定性（可分散）但可检测的行为？
RQ2如何构建并优化 UBW-P 与 UBW-C，以在效果、隐蔽性和离散性之间取得平衡？
RQ3基于 UBW 的信号是否能够在对可疑模型进行所有权验证时提供无害且隐蔽的检测？
RQ4UBW 方案是否对常见的后门防御和模型修改技术具有鲁棒性？

主要发现

UBW 在 ASR（攻击成功率）和数据集水印性能方面达到与定向后门相当的水平，同时的离散性高于多数基线，表明具有非确定性的恶意行为。
UBW-P 在 CIFAR-10 和 ImageNet 上显示出强烈的 ASR 且显著高于基线中毒标签攻击的 D_p。
UBW-C 相较于其他干净标签水印具有更高的隐蔽性，在实际中具有可观的 ASR 与有竞争力的 D_p。
基于 UBW 的验证在多种场景下能够以高置信度（低 p 值）可靠识别未授权的数据集使用，同时对独立模型的误报最小化。
UBW 展现出对微调和剪枝防御的鲁棒性，在适应性防御下保持显著的 ASR。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。