QUICK REVIEW

[论文解读] Contract And Conquer: How to Provably Compute Adversarial Examples for a Black-Box Model?

Anna Chistyakova, Mikhail Pautov|arXiv (Cornell University)|Mar 11, 2026

Adversarial Robustness in Machine Learning被引用 0

一句话总结

CAC 提供了一种可证明的、基于传递的黑箱模型对抗样本计算方法，通过迭代蒸馏代理模型并收缩搜索空间，带有收敛保证。

ABSTRACT

Black-box adversarial attacks are widely used as tools to test the robustness of deep neural networks against malicious perturbations of input data aimed at a specific change in the output of the model. Such methods, although they remain empirically effective, usually do not guarantee that an adversarial example can be found for a particular model. In this paper, we propose Contract And Conquer (CAC), an approach to provably compute adversarial examples for neural networks in a black-box manner. The method is based on knowledge distillation of a black-box model on an expanding distillation dataset and precise contraction of the adversarial example search space. CAC is supported by the transferability guarantee: we prove that the method yields an adversarial example for the black-box model within a fixed number of algorithm iterations. Experimentally, we demonstrate that the proposed approach outperforms existing state-of-the-art black-box attack methods on ImageNet dataset for different target models, including vision transformers.

研究动机与目标

为安全关键系统中的黑箱模型的鲁棒性测试提供动机。
开发具有理论保证的方法，为目标模型生成对抗样本。
利用知识蒸馏与收缩的搜索空间实现基于传递的攻击。
在 ImageNet 和 CIFAR-10 上将 CAC 与现有黑箱攻击进行比较。
在包括 Vision Transformers 在内的目标上展示实际效果。

提出的方法

通过在接近目标点 x 的蒸馏数据集上，迭代地从目标 T 蒸馏出一个代理 S。
在白箱设定下，对代理 S 使用 MI-FGSM，在每次迭代后收缩的搜索空间 Uδ(x) 中进行攻击。
传递性检查：若在 S 上 crafted 的 z_j 也能欺骗 T，则停止并输出 z_j；否则用 (z_j, T(z_j)) 增强蒸馏数据集，并通过 Uδ(x) ← Uδ(x) ∩ Uρ_j(z_j) 收紧搜索空间，ρ_j = t||z_j − z_{j−1}||∞。
在固定的目标模型查询预算下，基于收缩半径调整梯度步长 α。
提供收敛保证：在对代理的假设较温和的情况下，在有界的迭代次数内找到可转移到 T 的对抗样本。
注：CAC 对架构无关，MI-FGSM 作为引导性的白盒攻击。

Figure 1 : Illustration of the contraction of the adversarial example search space. Given the number $j$ of algorithm iteration, the adversarial example search space on iteration $j$ , namely, $U_{\delta}(x)_{j},$ is the intersection of the $\rho_{j}-$ vicinity of an adversarial example $z_{j}$ with

实验结果

研究问题

RQ1是否可以在固定迭代次数内，使用基于传递、驱动蒸馏的方法对黑箱模型进行可证明的攻击？
RQ2在何种条件下，代理模型能够保证对抗样本对黑箱目标的可转移性？
RQ3收缩对抗搜索空间如何影响收敛性与攻击效率？
RQ4在 ImageNet 和 CIFAR-10 的硬标签和软标签设置下，CAC 与现有黑箱攻击相比的表现如何，包含 Transformer 目标？

主要发现

Method	ASR	AQN	Avg l2	Std l2	Avg l∞	Std l∞
CAC (ours)	1.00	487.95	35.074	18.833	0.153	0.080
HopSkipJump l2	1.00	500.31	48.838	29.118	0.539	0.280
HopSkipJump l∞	1.00	500.01	73.255	35.856	0.361	0.202

CAC 在 ImageNet 和 CIFAR-10 上对抗样本距离目标点更近的情况下，达到较高的攻击成功率，相较于若干基线方法表现更优。
在硬标签的 ImageNet 实验中，CAC 达到 ASR = 1.00，并在目标模型（ResNet-50 与 ViT-B）上获得具有竞争力的平均 l2 与 l∞ 距离。
在软标签设置中，CAC 保持强劲表现，ASR = 1.00，且距离指标相对于基线方法更有利。
在 CIFAR-10 上，CAC 总体显示出更接近的对抗样本（较低的 l∞ 与 l2）和较高的 ASR，相较于 HopSkipJump、SignOPT、GeoDA、SquareAttack 和 SparseRS 等基线方法。
作者提供了一个理论收敛引理，在对代理梯度有界的假设下，界定了实现对抗样本可转移性所需的迭代次数。

Figure 2 : Schematic representation of the proposed method. Given alternation iteration $j$ and the target model $T$ , we prepare the distillation dataset $\mathcal{D}(S)$ and train the surrogate model $S_{j}$ . Then, $S_{j}$ is attacked at the target point $x$ in the white-box setting, and an adver

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。