QUICK REVIEW

[論文レビュー] Contract And Conquer: How to Provably Compute Adversarial Examples for a Black-Box Model?

Anna Chistyakova, Mikhail Pautov|arXiv (Cornell University)|Mar 11, 2026

Adversarial Robustness in Machine Learning被引用数 0

ひとこと要約

CACは、 surrogateを逐次蒸留し探索空間を収縮させることで、ブラックボックスモデルに対する敵対的サンプルを転送ベースで計算することを証明可能に行える方法を提供し、収束保証を与える。

ABSTRACT

Black-box adversarial attacks are widely used as tools to test the robustness of deep neural networks against malicious perturbations of input data aimed at a specific change in the output of the model. Such methods, although they remain empirically effective, usually do not guarantee that an adversarial example can be found for a particular model. In this paper, we propose Contract And Conquer (CAC), an approach to provably compute adversarial examples for neural networks in a black-box manner. The method is based on knowledge distillation of a black-box model on an expanding distillation dataset and precise contraction of the adversarial example search space. CAC is supported by the transferability guarantee: we prove that the method yields an adversarial example for the black-box model within a fixed number of algorithm iterations. Experimentally, we demonstrate that the proposed approach outperforms existing state-of-the-art black-box attack methods on ImageNet dataset for different target models, including vision transformers.

研究の動機と目的

安全-criticalなシステムにおけるブラックボックスモデルのロ robustness テストを動機づける。
ターゲットモデルの敵対的サンプルを生成する理論的保証を持つ手法を開発する。
知識蒸留と収束する探索空間を活用して転送ベースの攻撃を可能にする。
ImageNetと CIFAR-10 に対して CAC を既存のブラックボックス攻撃と比較する。
Vision Transformers を含むターゲット上で実用的効果を実証する。

提案手法

ターゲット T から蒸留データセットを用いて近傍点 x を中心とした surrogate S を反復蒸留する。
MI-FGSM を用いたホワイトボックス設定で surrogate S を攻撃し、各反復後に収縮する探索空間 Uδ(x) の中で攻撃する。
転送性の確認：S 上で作成した z_j が T も誤らせる場合、停止して z_j を出力する。それ以外の場合、蒸留データセットを (z_j, T(z_j)) で拡張し、探索空間を Uδ(x) ← Uδ(x) ∩ Uρ_j(z_j) に絞る。ρ_j = t||z_j − z_{j−1}||∞。
固定予算のターゲットモデル照会を使用し、収縮半径に基づいて勾配ステップ α を調整する。
収束保証を提供する：緩い surrogate の仮定の下、ターゲットへ転送可能な敵対的サンプルは有限回の反復内に見つかる。
注記：CAC は MI-FGSM を動機づけるホワイトボックス攻撃としてアーキテクチャに依存しない。

Figure 1 : Illustration of the contraction of the adversarial example search space. Given the number $j$ of algorithm iteration, the adversarial example search space on iteration $j$ , namely, $U_{\delta}(x)_{j},$ is the intersection of the $\rho_{j}-$ vicinity of an adversarial example $z_{j}$ with

実験結果

リサーチクエスチョン

RQ1ブラックボックスモデルを固定回数の反復で転送ベース・蒸留推進アプローチによって証明可能に攻撃できるか。
RQ2どの条件下で surrogate モデルがブラックボックスターゲットへの敵対的サンプルの保証された転送性を可能にするか。
RQ3敵対的探索空間を収縮させることは収束と攻撃効率にどう影響するか。
RQ4ImageNet および CIFAR-10 において hard-label および soft-label 設定、トランスフォーマーターゲットを含むケースで、CAC は既存のブラックボックス攻撃とどう比較されるか。

主な発見

Method	ASR	AQN	Avg l2	Std l2	Avg l∞	Std l∞
CAC (ours)	1.00	487.95	35.074	18.833	0.153	0.080
HopSkipJump l2	1.00	500.31	48.838	29.118	0.539	0.280
HopSkipJump l∞	1.00	500.01	73.255	35.856	0.361	0.202

CAC は ImageNet および CIFAR-10 で、ターゲット点に近い敵対的サンプルを作成することで高い攻撃成功率を達成し、いくつかのベースラインより l2・l∞ 距離が小さくなる。
Hard-label ImageNet 実験では、CAC は ASR = 1.00 を達成し、ResNet-50 および ViT-B などのターゲットモデル間で平均の l2 および l∞ 距離が競争力を持つ。
Soft-label 設定でも、CAC は ASR = 1.00 を維持し、距離指標がベースライン手法と比較して有利。
CIFAR-10 全体で、CAC は HopSkipJump、SignOPT、GeoDA、SquareAttack、SparseRS などのベースラインと比較して、敵対的サンプルが近く（低い l∞・l2）、ASR が高いことを一貫して示す。
著者らは、 surrogate 勾配境界の仮定の下、転送可能な敵対的サンプルを得るための反復回数を有界に制限する理論的収束補題を提供する。

Figure 2 : Schematic representation of the proposed method. Given alternation iteration $j$ and the target model $T$ , we prepare the distillation dataset $\mathcal{D}(S)$ and train the surrogate model $S_{j}$ . Then, $S_{j}$ is attacked at the target point $x$ in the white-box setting, and an adver

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。