QUICK REVIEW

[논문 리뷰] Contract And Conquer: How to Provably Compute Adversarial Examples for a Black-Box Model?

Anna Chistyakova, Mikhail Pautov|arXiv (Cornell University)|2026. 03. 11.

Adversarial Robustness in Machine Learning인용 수 0

한 줄 요약

CAC는 surrogate를 반복적으로 증류하고 탐색 공간을 축소하여 블랙박스 모델에 대한 adversarial 예제를 계산하는 입증 가능한 전이 기반 방법으로 수렴 보장을 제공합니다.

ABSTRACT

Black-box adversarial attacks are widely used as tools to test the robustness of deep neural networks against malicious perturbations of input data aimed at a specific change in the output of the model. Such methods, although they remain empirically effective, usually do not guarantee that an adversarial example can be found for a particular model. In this paper, we propose Contract And Conquer (CAC), an approach to provably compute adversarial examples for neural networks in a black-box manner. The method is based on knowledge distillation of a black-box model on an expanding distillation dataset and precise contraction of the adversarial example search space. CAC is supported by the transferability guarantee: we prove that the method yields an adversarial example for the black-box model within a fixed number of algorithm iterations. Experimentally, we demonstrate that the proposed approach outperforms existing state-of-the-art black-box attack methods on ImageNet dataset for different target models, including vision transformers.

연구 동기 및 목표

Motivate robustness testing for black-box models in safety-critical systems.
Develop a method with theoretical guarantees to produce adversarial examples for a target model.
Leverage knowledge distillation and a contracting search space to enable transfer-based attacks.
Compare CAC against existing black-box attacks on ImageNet and CIFAR-10.
Demonstrate practical effectiveness on targets including Vision Transformers.

제안 방법

Iteratively distill a surrogate S from the target T using a distillation dataset centered near the target point x.
Attack the surrogate S in a white-box setting using MI-FGSM within an evolving search space Uδ(x) that contracts after each iteration.
Transferability check: if z_j crafted on S also fools T, stop and output z_j; otherwise augment the distillation dataset with (z_j, T(z_j)) and tighten the search space via Uδ(x) ← Uδ(x) ∩ Uρ_j(z_j) with ρ_j = t||z_j − z_{j−1}||∞.
Use a fixed budget of target-model queries and adjust the gradient step α based on the contraction radius.
Provide a convergence guarantee: under mild surrogate assumptions, an adversarial example transferable to T is found within a bounded number of iterations.
Note: CAC is architecture-agnostic with MI-FGSM as a motivating white-box attack.

Figure 1 : Illustration of the contraction of the adversarial example search space. Given the number $j$ of algorithm iteration, the adversarial example search space on iteration $j$ , namely, $U_{\delta}(x)_{j},$ is the intersection of the $\rho_{j}-$ vicinity of an adversarial example $z_{j}$ with

실험 결과

연구 질문

RQ1Can a black-box model be provably attacked within a fixed number of iterations using a transfer-based, distillation-driven approach?
RQ2Under what conditions does a surrogate model enable guaranteed transferability of adversarial examples to the black-box target?
RQ3How does contracting the adversarial search space influence convergence and attack efficiency?
RQ4How does CAC perform compared to existing black-box attacks on ImageNet and CIFAR-10 across hard-label and soft-label settings, including transformer targets?

주요 결과

Method	ASR	AQN	Avg l2	Std l2	Avg l∞	Std l∞
CAC (ours)	1.00	487.95	35.074	18.833	0.153	0.080
HopSkipJump l2	1.00	500.31	48.838	29.118	0.539	0.280
HopSkipJump l∞	1.00	500.01	73.255	35.856	0.361	0.202

CAC achieves high attack success rates with adversarial examples closer to target points (lower l2 and l∞ distances) than several baselines on ImageNet and CIFAR-10.
In hard-label ImageNet experiments, CAC attains ASR = 1.00 and competitive Avg l2 and l∞ distances across target models (ResNet-50 and ViT-B).
In soft-label settings, CAC maintains strong performance with ASR = 1.00 and favorable distance metrics relative to baseline methods.
Across CIFAR-10, CAC consistently shows close adversarial examples (low l∞ and l2) and high ASR compared to baselines such as HopSkipJump, SignOPT, GeoDA, SquareAttack, and SparseRS.
The authors provide a theoretical convergence lemma bounding iterations to achieve transferable adversarial examples under surrogate gradient-bounded assumptions.

Figure 2 : Schematic representation of the proposed method. Given alternation iteration $j$ and the target model $T$ , we prepare the distillation dataset $\mathcal{D}(S)$ and train the surrogate model $S_{j}$ . Then, $S_{j}$ is attacked at the target point $x$ in the white-box setting, and an adver

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.