QUICK REVIEW

[論文レビュー] Perceptual Adversarial Robustness: Defense Against Unseen Threat Models

Cassidy Laidlaw, Sahil Singla|arXiv (Cornell University)|Jun 22, 2020

Adversarial Robustness in Machine Learning参考文献 46被引用数 19

ひとこと要約

本稿では、人間の知覚を代替するものとしてLPIPS（神経的知覚距離）を用いることで、目に見えないすべての敵対的攻撃に対して耐性を持つモデルを訓練するPerceptual Adversarial Training（PAT）を提案する。PATは、CIFAR-10およびImageNet-100において、L₂、L∞、空間的変換、再色調、JPEG圧縮の5種類の多様な未観測攻撃に対して、それらの攻撃を訓練時に使用せず、かつSOTAの耐性を達成し、正確に2倍以上の精度を向上させた。これは、予期しない脅威モデルへの優れた一般化性能を示している。

ABSTRACT

A key challenge in adversarial robustness is the lack of a precise mathematical characterization of human perception, used in the very definition of adversarial attacks that are imperceptible to human eyes. Most current attacks and defenses try to avoid this issue by considering restrictive adversarial threat models such as those bounded by $L_2$ or $L_\infty$ distance, spatial perturbations, etc. However, models that are robust against any of these restrictive threat models are still fragile against other threat models. To resolve this issue, we propose adversarial training against the set of all imperceptible adversarial examples, approximated using deep neural networks. We call this threat model the neural perceptual threat model (NPTM); it includes adversarial examples with a bounded neural perceptual distance (a neural network-based approximation of the true perceptual distance) to natural images. Through an extensive perceptual study, we show that the neural perceptual distance correlates well with human judgements of perceptibility of adversarial examples, validating our threat model. Under the NPTM, we develop novel perceptual adversarial attacks and defenses. Because the NPTM is very broad, we find that Perceptual Adversarial Training (PAT) against a perceptual attack gives robustness against many other types of adversarial attacks. We test PAT on CIFAR-10 and ImageNet-100 against five diverse adversarial attacks. We find that PAT achieves state-of-the-art robustness against the union of these five attacks, more than doubling the accuracy over the next best model, without training against any of them. That is, PAT generalizes well to unforeseen perturbation types. This is vital in sensitive applications where a particular threat model cannot be assumed, and to the best of our knowledge, PAT is the first adversarial training defense with this property.

研究の動機と目的

敵対的耐性研究における人間の知覚の明確な数学的特徴付けの欠如に対処すること。
L₂やL∞のような制限的な脅威モデルの限界を克服し、未観測の攻撃タイプに一般化できないこと。
知覚的脅威モデルをモデル化することで、多様で予期せぬ摂動タイプにわたる耐性の一般化を実現すること。
神経的知覚距離（LPIPS）が人間の知覚とよく相関することを検証し、スケーラブルな敵対的訓練を可能にすること。
広範な知覚的脅威モデルに対する訓練が、標的攻撃および非標的攻撃の両方に対して優れた一般化をもたらすことを示すこと、また、一般的な劣化（例：ぼかし、ノイズ、天候変化）に対しても同様に有効であることを示すこと。

提案手法

人間にとって目に見えないすべての摂動を形式化した真の知覚距離d*を用いて、知覚的敵対的脅威モデルを定義する。
深層ネットワークの活性化に基づく学習された知覚的類似度メトリックであるLPIPSを用いて、計算不能な真の知覚距離d*を近似する。
自然画像からのLPIPS距離が一定範囲内にあるすべての敵対的例を含む、ニューラル的知覚的脅威モデル（NPTM）を提案する。
LPIPSに基づく制約を課した投影勾配降下法（PGD）を用いて、目に見えない敵対的例を生成する新しい知覚的敵対的攻撃を開発する。
これらの知覚的攻撃を用いて敵対的訓練を実施し、Perceptual Adversarial Training（PAT）を構築する。
自己教師ありおよび事前学習済みモデル（例：AlexNet）を用いてLPIPSを攻撃および防御の両方で計算し、転送可能な耐性を実現する。

実験結果

リサーチクエスチョン

RQ1訓練時に観測していない攻撃タイプに対して、広範な知覚的脅威モデルに対する防御が一般化できるか？
RQ2従来のLpノルムと比較して、LPIPS距離は画像摂動の知覚的評価とどれほど相関しているか？
RQ3ニューラル的知覚的脅威モデル（NPTM）に基づく敵対的訓練は、L₂やL∞制約に基づく標準的敵対的訓練よりも優れた耐性をもたらすか？
RQ4PATは、訓練時に明示的に標的とされていない自然な劣化（例：ぼかし、ノイズ、天候）に対しても一般化できるか？
RQ5PATを用いることで、標準的敵対的訓練法と比較して、クリーン精度と耐性のトレードオフが生じるか？

主な発見

PATはCIFAR-10でSOTAの耐性を達成し、L₂、L∞、空間的変換、再色調、JPEGの5つの多様な攻撃の集合に対して、それらの攻撃を訓練時に使用せず、次の最高性能モデルをはるかに上回る精度（2倍以上）を達成した。
CIFAR-10-Cでは、PAT-selfの相対的平均劣化誤差（mCE）が0.50、PAT-AlexNetが0.49であり、L₂敵対的訓練（0.54）やL∞敵対的訓練（0.57）よりも顕著に低い値を示した。
ImageNet-100-Cでは、PAT-selfが相対的mCE 0.37、PAT-AlexNetが0.39を達成し、L₂（0.41）やL∞（0.42）敵対的訓練をすべての劣化タイプで上回った（ノイズの場合はL₂が最良）。
知覚的研究を通じて、LPIPSが人間の知覚と強く相関していることが実証され、真の知覚距離の代替としての有効性が裏付けられた。
PATは自然な劣化に対しても耐性を一般化しており、最悪の知覚的摂動に対する耐性が、ランダムで現実世界の歪みに対しても耐性をもたらすことを示している。
PATは、CIFAR-10で93.4%という高いクリーン精度を維持しながら、優れた耐性を達成しており、従来手法と比較して精度と耐性のトレードオフが良好であることが示された。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。