QUICK REVIEW

[論文レビュー] Adversarial Machine Learning at Scale

Alexey Kurakin, Ian Goodfellow|arXiv (Cornell University)|Nov 4, 2016

Adversarial Robustness in Machine Learning被引用数 375

ひとこと要約

本論文は Inception v3 を用いた ImageNet に対するスケーラブルな adversarial 訓練を実証し、ワンステップ adversarial 攻撃に対する頑健性の向上を示し、転移性、モデル容量の影響、ラベル漏洩現象について議論する。

ABSTRACT

Adversarial examples are malicious inputs designed to fool machine learning models. They often transfer from one model to another, allowing attackers to mount black box attacks without knowledge of the target model's parameters. Adversarial training is the process of explicitly training a model on adversarial examples, in order to make it more robust to attack or to reduce its test error on clean inputs. So far, adversarial training has primarily been applied to small problems. In this research, we apply adversarial training to ImageNet. Our contributions include: (1) recommendations for how to succesfully scale adversarial training to large models and datasets, (2) the observation that adversarial training confers robustness to single-step attack methods, (3) the finding that multi-step attack methods are somewhat less transferable than single-step attack methods, so single-step attacks are the best for mounting black-box attacks, and (4) resolution of a "label leaking" effect that causes adversarially trained models to perform better on adversarial examples than on clean examples, because the adversarial example construction process uses the true label and the model can learn to exploit regularities in the construction process.

研究の動機と目的

大規模モデルとデータセット（ImageNet）に対して、バッチ正規化と混合の adversarial/クリーンミニバッチを用いてスケーラブルな adversarial 訓練を実証する。
訓練済みモデルの頑健性を、特に one-step と multi-step 攻撃の違いを含む、さまざまな adversarial 攻撃手法に対して評価する。
モデル容量と訓練選択が adversarial perturbation への頑健性にどう影響するかを調査する。
モデル間の adversarial 例の転移性を特定し、ブラックボックス攻撃への示唆を検討する。
adversarial 訓練シナリオにおける label leaking 効果を暴露し分析する。

提案手法

さまざまな adversarial 例生成手法（one-step および反復的）の検討と比較。
loss 重み付けパラメータ lambda を controllable に用いて、各ミニバッチに adversarial 例を注入する adversarial training アルゴリズムを提案。
固定の摂動サイズへの過学習を防ぐため、例ごとにランダム化された epsilon を使用。
安定した大規模訓練のため、バッチ正規化とクリーンと adversarial の混合ミニバッチを採用。
ImageNet 上で Inception v3 を用いて評価を行い、RMSPropと50台のマシンを跨ぐ同期分散訓練を用いる。

実験結果

リサーチクエスチョン

RQ1adversarial 訓練を ImageNet のような大規模なモデルとデータセットへどうスケールさせることができるか？
RQ2one-step 攻撃を用いた adversarial 訓練は、他の one-step および一部の multi-step 攻撃に対して頑健性を提供するか？
RQ3モデル容量は adversarial 訓練の有無にかかわらず、adversarial robustness にどう影響するか？
RQ4モデル間の adversarial 例の転移性はどのようなもので、攻撃タイプはそれにどう影響するか？
RQ5adversarial 訓練における label leaking 現象は存在するか、堅牢な評価のために攻撃はどのように構築すべきか？

主な発見

one-step 手法を用いた adversarial 訓練は、それらの one-step 攻撃に対する頑健性を高め、 adversarial 例での top-1 精度を約74%程度まで達成する一方、クリーン精度は約0.8%低下する。
モデル容量を増やす（深くする/幅を広くする）と、adversarial 訓練と組み合わせると頑健性が向上する。
Iterative adversarial 例は、one-step 訓練から得られた頑健性に対して大部分が抵抗性を保ち、multi-step 攻撃に対するクロス保護が限定的であることを示唆する。
Transferability は FGSM_STYLE adversarial 例で高く、Iterative one-step 方法は転移が小さい。ブラックボックス攻撃のセキュリティ上の利点を示唆。
true ラベルを one-step の adversarial 構成に使用した場合、 adversarial 例の精度がクリーンより高くなる label leaking 効果を観測。真のラベルを使用しない場合または iterative 方法を使用すると効果は消える。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。