QUICK REVIEW

[论文解读] SCALE-UP: An Efficient Black-box Input-level Backdoor Detection via Analyzing Scaled Prediction Consistency

Junfeng Guo, Yiming Li|arXiv (Cornell University)|Feb 7, 2023

Adversarial Robustness in Machine Learning被引用 19

一句话总结

SCALE-UP 在黑盒 MLaaS 设置中通过测量放大输入的放大后预测一致性（SCPC）的尺度化预测一致性来检测后门输入，在数据无关和数据有限情境下，具有理论支撑和强有力的经验结果。

ABSTRACT

Deep neural networks (DNNs) are vulnerable to backdoor attacks, where adversaries embed a hidden backdoor trigger during the training process for malicious prediction manipulation. These attacks pose great threats to the applications of DNNs under the real-world machine learning as a service (MLaaS) setting, where the deployed model is fully black-box while the users can only query and obtain its predictions. Currently, there are many existing defenses to reduce backdoor threats. However, almost all of them cannot be adopted in MLaaS scenarios since they require getting access to or even modifying the suspicious models. In this paper, we propose a simple yet effective black-box input-level backdoor detection, called SCALE-UP, which requires only the predicted labels to alleviate this problem. Specifically, we identify and filter malicious testing samples by analyzing their prediction consistency during the pixel-wise amplification process. Our defense is motivated by an intriguing observation (dubbed scaled prediction consistency) that the predictions of poisoned samples are significantly more consistent compared to those of benign ones when amplifying all pixel values. Besides, we also provide theoretical foundations to explain this phenomenon. Extensive experiments are conducted on benchmark datasets, verifying the effectiveness and efficiency of our defense and its resistance to potential adaptive attacks. Our codes are available at https://github.com/JunfengGo/SCALE-UP.

研究动机与目标

揭示一个预测一致性现象（尺度化预测一致性），在像素值被放大时能将被污染样本与良性样本区分开。
提供对尺度化预测一致性的理论解释。
提出 SCALE-UP 作为一个黑盒输入级后门检测器，能够在数据无关和数据有限的设置中使用。
通过广泛的实验展示有效性与效率，并评估对自适应攻击的抗性。

提出的方法

研究像素级放大及其对被攻击模型中良性与被污染输入预测的影响。
将尺度化预测一致性（SPC）定义为预测标签与原始输入标签相匹配的放大图像所占比例。
开发数据无关的 SCALE-UP：使用预设的缩放集合对可疑输入计算 SPC，并基于一个阈值进行分类。
通过如果使用小型良性样本集得到的类维度均值和标准差对 SPC 进行归一化，扩展为数据有限的 SCALE-UP，以降低类别差异效应。
通过一个神经切线核（NTK）启发的分析提供理论支撑，以证明尺度化预测一致性。
在 CIFAR-10 和 Tiny ImageNet 上对六种具有代表性的后门攻击进行评估，并与其他黑盒防御方法进行对比。

实验结果

研究问题

RQ1在黑盒设置下， poisoned 与 benign 样本在像素级放大时的预测行为是否可区分？
RQ2尺度化预测一致性是否提供稳健、数据高效的信号来检测后门，而无需访问模型？
RQ3在保持高效与精度的前提下，如何将 SCALE-UP 适配到数据无关与数据有限情景？
RQ4先进的自适应后门策略是否能够规避基于 SPC 的检测？

主要发现

攻击	STRIP	ShrinkPad	DeepSweep	频率	我方（数据无关）	我方（数据有限）	平均
BadNets	0.989	0.951	0.967	0.891	0.971	0.971	0.895
Label-Consistent	0.941	0.957	0.921	0.889	0.947	0.954	0.915
PhysicalBA	0.971	0.631	0.946	0.881	0.969	0.970	0.896
TUAP	0.671	0.869	0.743	0.851	0.816	0.830	0.792
WaNet	0.475	0.531	0.506	0.461	0.918	0.925	0.672
ISSBA	0.498	0.513	0.729	0.497	0.945	0.945	0.614
Average	0.8??	0.733??	0.83??	0.657??	0.918??	0.945??	N/A

被污染样本在对被攻击模型进行像素级放大时的预测更加稳定（尺度化预测一致性），相较于良性样本。
SCALE-UP 在多种攻击和数据集上实现了较高的 AUROC，优于若干基线方法，包括需要概率向量的方法。
数据无关的 SCALE-UP 使用 SPC 以防御者设定的阈值来识别恶意输入，而数据有限的 SCALE-UP 通过使用良性样本的类别统计对 SPC 进行归一化以提升精度。
SCALE-UP 对基于补丁和非基于补丁的后门仍然有效，并对自适应攻击表现出韧性（除了一种强正则化自适应攻击，可以通过添加少量随机噪声来缓解）。
推理时间开销适中，SCALE-UP 通常比许多基线更快，且仅比标准推理略慢。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。