[논문 리뷰] Contract And Conquer: How to Provably Compute Adversarial Examples for a Black-Box Model?
CAC는 surrogate를 반복적으로 증류하고 탐색 공간을 축소하여 블랙박스 모델에 대한 adversarial 예제를 계산하는 입증 가능한 전이 기반 방법으로 수렴 보장을 제공합니다.
Black-box adversarial attacks are widely used as tools to test the robustness of deep neural networks against malicious perturbations of input data aimed at a specific change in the output of the model. Such methods, although they remain empirically effective, usually do not guarantee that an adversarial example can be found for a particular model. In this paper, we propose Contract And Conquer (CAC), an approach to provably compute adversarial examples for neural networks in a black-box manner. The method is based on knowledge distillation of a black-box model on an expanding distillation dataset and precise contraction of the adversarial example search space. CAC is supported by the transferability guarantee: we prove that the method yields an adversarial example for the black-box model within a fixed number of algorithm iterations. Experimentally, we demonstrate that the proposed approach outperforms existing state-of-the-art black-box attack methods on ImageNet dataset for different target models, including vision transformers.
연구 동기 및 목표
- Motivate robustness testing for black-box models in safety-critical systems.
- Develop a method with theoretical guarantees to produce adversarial examples for a target model.
- Leverage knowledge distillation and a contracting search space to enable transfer-based attacks.
- Compare CAC against existing black-box attacks on ImageNet and CIFAR-10.
- Demonstrate practical effectiveness on targets including Vision Transformers.
제안 방법
- Iteratively distill a surrogate S from the target T using a distillation dataset centered near the target point x.
- Attack the surrogate S in a white-box setting using MI-FGSM within an evolving search space Uδ(x) that contracts after each iteration.
- Transferability check: if z_j crafted on S also fools T, stop and output z_j; otherwise augment the distillation dataset with (z_j, T(z_j)) and tighten the search space via Uδ(x) ← Uδ(x) ∩ Uρ_j(z_j) with ρ_j = t||z_j − z_{j−1}||∞.
- Use a fixed budget of target-model queries and adjust the gradient step α based on the contraction radius.
- Provide a convergence guarantee: under mild surrogate assumptions, an adversarial example transferable to T is found within a bounded number of iterations.
- Note: CAC is architecture-agnostic with MI-FGSM as a motivating white-box attack.

실험 결과
연구 질문
- RQ1Can a black-box model be provably attacked within a fixed number of iterations using a transfer-based, distillation-driven approach?
- RQ2Under what conditions does a surrogate model enable guaranteed transferability of adversarial examples to the black-box target?
- RQ3How does contracting the adversarial search space influence convergence and attack efficiency?
- RQ4How does CAC perform compared to existing black-box attacks on ImageNet and CIFAR-10 across hard-label and soft-label settings, including transformer targets?
주요 결과
| Method | ASR | AQN | Avg l2 | Std l2 | Avg l∞ | Std l∞ |
|---|---|---|---|---|---|---|
| CAC (ours) | 1.00 | 487.95 | 35.074 | 18.833 | 0.153 | 0.080 |
| HopSkipJump l2 | 1.00 | 500.31 | 48.838 | 29.118 | 0.539 | 0.280 |
| HopSkipJump l∞ | 1.00 | 500.01 | 73.255 | 35.856 | 0.361 | 0.202 |
- CAC achieves high attack success rates with adversarial examples closer to target points (lower l2 and l∞ distances) than several baselines on ImageNet and CIFAR-10.
- In hard-label ImageNet experiments, CAC attains ASR = 1.00 and competitive Avg l2 and l∞ distances across target models (ResNet-50 and ViT-B).
- In soft-label settings, CAC maintains strong performance with ASR = 1.00 and favorable distance metrics relative to baseline methods.
- Across CIFAR-10, CAC consistently shows close adversarial examples (low l∞ and l2) and high ASR compared to baselines such as HopSkipJump, SignOPT, GeoDA, SquareAttack, and SparseRS.
- The authors provide a theoretical convergence lemma bounding iterations to achieve transferable adversarial examples under surrogate gradient-bounded assumptions.

더 나은 연구,지금 바로 시작하세요
연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.
카드 등록 없음 · 무료 플랜 제공
이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.