[論文レビュー] Prior Convictions: Black-Box Adversarial Attacks with Bandits and Priors
本論文は、勾配事前情報を活用したバンディット最適化フレームワークを提案し、ブラックボックス対向攻撃をより効率的に実行することで、従来手法と比較してクエリ回数と失敗率を低減します。
We study the problem of generating adversarial examples in a black-box setting in which only loss-oracle access to a model is available. We introduce a framework that conceptually unifies much of the existing work on black-box attacks, and we demonstrate that the current state-of-the-art methods are optimal in a natural sense. Despite this optimality, we show how to improve black-box attacks by bringing a new element into the problem: gradient priors. We give a bandit optimization-based algorithm that allows us to seamlessly integrate any such priors, and we explicitly identify and incorporate two examples. The resulting methods use two to four times fewer queries and fail two to five times less often than the current state-of-the-art.
研究の動機と目的
- Formalize gradient estimation as the central problem for query-efficient black-box attacks and show least-squares estimation is optimal in general settings.
- Introduce gradient priors (time-dependent and data-dependent) to exploit structure in gradients.
- Develop a bandit optimization framework to integrate priors into gradient estimation for adversarial example generation.
- Demonstrate substantial improvements in query efficiency and reduced failure rates over prior state-of-the-art on ImageNet classifiers.
提案手法
- Model the gradient estimation task as a bandit problem where actions are gradient estimates and losses measure inner-product alignment with the true gradient.
- Show least-squares gradient estimation is equivalent to NES and is optimal in underdetermined regimes.
- Propose two gradient priors: time-dependent (gradient correlation along optimization path) and data-dependent (spatial gradient similarity via tiling).
- Use a two-query spherical gradient estimator within a bandit update (A updates v_t; g_t = projection of v_t).
- Translate the bandit-optimized gradient into an iterative adversarial attack by updating inputs with projected gradient steps and projecting back to the perturbation set.
実験結果
リサーチクエスチョン
- RQ1Can gradient estimation for black-box adversarial attacks be made more efficient by exploiting priors?
- RQ2What priors on gradients (time-dependent, data-dependent) improve query efficiency and reduce failure rates?
- RQ3Is there a principled bandit framework to incorporate gradient priors into black-box adversarial example generation?
- RQ4How do priors affect performance in l2 and l-infinity constrained ImageNet attacks?
主な発見
| Attack | Avg. Queries (l_inf) | Avg. Queries (l2) | Failure Rate (l_inf) | Failure Rate (l2) | Queries on NES Success (l_inf) | Queries on NES Success (l2) |
|---|---|---|---|---|---|---|
| NES | 1735 | 2938 | 22.2% | 34.4% | 1735 | 2938 |
| Bandits T | 1781 | 2690 | 11.6% | 30.4% | 1214 | 2421 |
| Bandits TD | 1117 | 1858 | 4.6% | 15.5% | 703 | 999 |
- Bandits TD (time + data priors) achieves 2–5x reduction in failure rate compared to prior state-of-the-art.
- Bandits methods yield 2–4x fewer queries than NES while maintaining or increasing success rates.
- Under l-infinity and l2 constraints on ImageNet, Bandits TD attains substantially lower average queries per successful attack.
- Two priors (time-correlated gradients along optimization path and data-dependent spatial gradient similarity) substantially improve gradient prediction quality.
- The least-squares gradient estimator remains optimal in the standard setting, but priors enable further gains beyond this baseline.
より良い研究を、今すぐ始めましょう
論文設計から論文執筆まで、研究時間を劇的に削減しましょう。
クレジットカード登録不要
このレビューはAIが作成し、人間の編集者が確認しました。