QUICK REVIEW

[論文レビュー] Eliminating Catastrophic Overfitting Via Abnormal Adversarial Examples Regularization

Runqi Lin, Chaojian Yu|arXiv (Cornell University)|Apr 11, 2024

Industrial Vision Systems and Defect Detection被引用数 5

ひとこと要約

本論文は AAER を紹介します。これは、単一ステップの敵対的訓練における致命的な過剰適合を抑制するための正則化手法で、異常な敵対的例を抑制し、最小限のオーバーヘッドで頑健性を向上させます。

ABSTRACT

Single-step adversarial training (SSAT) has demonstrated the potential to achieve both efficiency and robustness. However, SSAT suffers from catastrophic overfitting (CO), a phenomenon that leads to a severely distorted classifier, making it vulnerable to multi-step adversarial attacks. In this work, we observe that some adversarial examples generated on the SSAT-trained network exhibit anomalous behaviour, that is, although these training samples are generated by the inner maximization process, their associated loss decreases instead, which we named abnormal adversarial examples (AAEs). Upon further analysis, we discover a close relationship between AAEs and classifier distortion, as both the number and outputs of AAEs undergo a significant variation with the onset of CO. Given this observation, we re-examine the SSAT process and uncover that before the occurrence of CO, the classifier already displayed a slight distortion, indicated by the presence of few AAEs. Furthermore, the classifier directly optimizing these AAEs will accelerate its distortion, and correspondingly, the variation of AAEs will sharply increase as a result. In such a vicious circle, the classifier rapidly becomes highly distorted and manifests as CO within a few iterations. These observations motivate us to eliminate CO by hindering the generation of AAEs. Specifically, we design a novel method, termed Abnormal Adversarial Examples Regularization (AAER), which explicitly regularizes the variation of AAEs to hinder the classifier from becoming distorted. Extensive experiments demonstrate that our method can effectively eliminate CO and further boost adversarial robustness with negligible additional computational overhead.

研究の動機と目的

SSAT における異常な敵対的例（AAEs）と致命的過剰適合（CO）の関連を調査する。
AAEs が訓練中および CO 発生時にどのように振る舞い、どのように変化するかを特徴づける。
追加計算なしで AAEs を正則化し CO を防ぐための AAER を提案・検証する。
データセット・攻撃・アーキテクチャを横断して AAER の有効性を示す。
他の防御法と比較した AAER の計算効率を評価する。

提案手法

AAEs を、摂動後に内部最大化損失が低下する摂動として定義する。
訓練中の AAEs の数と出力の変動を定量化し、それを CO との関係で評価する。
(i) AAEs の数、(ii) 異常な予測信頼度の変化、(iii) AAEs のロジット分布の変動をペナルティとして課す AAER 正則化を開発する。
これらの成分を、λ1、λ2、λ3 という調整可能な超パラメータを持つ単一の正則化項 AAER に結合する。
AAER は追加の例生成や逆伝播パスを必要とせず、低オーバーヘッドを維持することを示す。
PreActResNet-18 および WideResNet-34 を用い、RS-FGSM と N-FGSM のベースラインで CIFAR-10/100、SVHN、Tiny-ImageNet、ImageNet-100 に対して AAER を評価する。

実験結果

リサーチクエスチョン

RQ1SSAT 訓練中の AAEs と分類器の歪み/CO との関係は何か？
RQ2AAEs（数と出力変動）を抑制することで CO を防ぎつつ、頑健性を維持または向上させることができるか？
RQ3様々なデータセット、攻撃、ネットワークアーキテクチャにおいて、頑健性と効率の観点で AAER はどのように機能するか？

主な発見

AAEs は CO の前に早期に現れ、CO が始まると急増し、頑健性の低下と相関する。
AAE の出力変動、特にロジットの乱れは CO の間に劇的に増大し、決定境界の歪みを示唆する。
AAEs の受動的除去は CO を遅らせるが防止には至らず、積極的な正則化を動機づける。
AAER はノイズ量とデータセットを跨いで CO を効果的に防止し、オーバーヘッドはほとんどない。
AAER は訓練時間を単一ステップの敵対訓練ベースラインに近い状態に保ちつつ、頑健性を向上させる。
RN アーキテクチャと比較可能なベースラインを用いて、CIFAR-10/100 および CIFAR-100 で AAER の有効性を実証。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。