QUICK REVIEW

[논문 리뷰] Attacks Which Do Not Kill Training Make Adversarial Learning Stronger

Jingfeng Zhang, Xilie Xu|arXiv (Cornell University)|2020. 02. 26.

Adversarial Robustness in Machine Learning참고 문헌 50인용 수 84

한 줄 요약

Friendly Adversarial Training (FAT)을 제안합니다. 이는 misclassified samples 내에서 가장 adversarial하지 않은 데이터를 사용하는 friendly adversarial data를 early-stopped PGD로 활용하여 natural accuracy를 해치지 않으면서 robustness를 향상시킵니다. 이론적 상한과 경험적 증거를 제공하여 robustness가 표준 일반화에 해를 주지 않고도 달성될 수 있음을 보여줍니다.

ABSTRACT

Adversarial training based on the minimax formulation is necessary for obtaining adversarial robustness of trained models. However, it is conservative or even pessimistic so that it sometimes hurts the natural generalization. In this paper, we raise a fundamental question---do we have to trade off natural generalization for adversarial robustness? We argue that adversarial training is to employ confident adversarial data for updating the current model. We propose a novel approach of friendly adversarial training (FAT): rather than employing most adversarial data maximizing the loss, we search for least adversarial (i.e., friendly adversarial) data minimizing the loss, among the adversarial data that are confidently misclassified. Our novel formulation is easy to implement by just stopping the most adversarial data searching algorithms such as PGD (projected gradient descent) early, which we call early-stopped PGD. Theoretically, FAT is justified by an upper bound of the adversarial risk. Empirically, early-stopped PGD allows us to answer the earlier question negatively---adversarial robustness can indeed be achieved without compromising the natural generalization.

연구 동기 및 목표

Question the necessity of trading off natural generalization for adversarial robustness in standard adversarial training.
Introduce friendly adversarial training (FAT) that uses least adversarial, confidently misclassified data to update models.
Provide theoretical justification via an upper bound on adversarial risk for FAT.
Show that early-stopped PGD can implement FAT efficiently and improve both standard and robust accuracy.
Demonstrate that FAT enables larger perturbation budgets (epsilon) during training while maintaining or improving performance.

제안 방법

Formulate FAT as minimizing loss over friendly adversarial data, defined by misclassified adversarial samples with a margin constraint rho.
Replace the inner maximization in standard adversarial training with a constrained minimization over adversarial samples that satisfy a confidence margin and minimize the loss.
Develop PGD-K-τ, an early-stopped PGD algorithm that selects misclassified samples with small loss (or correctly classified samples with large loss) controlled by τ, generalizing conventional PGD-K.
Prove a tight upper bound on adversarial risk that incorporates both standard and robust terms and uses a margin parameter ρ.
Provide a practical FAT algorithm that can adapt existing defenses (e.g., TRADES, MART) into FAT variants.
Empirically validate FAT on CIFAR-10 and SVHN with ResNet-18, Small CNN, and Wide ResNet architectures, comparing standard and robust accuracies under various attacks.

실험 결과

연구 질문

RQ1Can adversarial robustness be achieved without sacrificing natural generalization in adversarial training?
RQ2Does using friendly adversarial data (within misclassified samples) improve training stability and generalization?
RQ3How does early-stopped PGD (PGD-K-τ) influence the training dynamics and robustness of the model?
RQ4What theoretical guarantees can justify FAT as an upper-bound approach to adversarial risk?
RQ5Is FAT compatible with existing adversarial training methods (TRADES, MART) to yield improved performance?

주요 결과

FAT improves standard (natural) test accuracy while maintaining competitive robust accuracy across attacks.
Early-stopped PGD (PGD-K-τ) alleviates cross-over mixture of adversarial data and enables progressive strengthening of robustness during training.
FAT enables larger training perturbation budgets ε_train without harming generalization, unlike conventional adversarial training.
Theoretical upper bound on adversarial risk shows FAT can reduce risk by combining misclassified adversarial data with confidence margins.
FAT variants can be derived from existing methods (e.g., FAT for TRADES, FAT for MART), offering practical paths for deployment.
Empirical results indicate robustness can be enhanced with controlled τ values (e.g., τ in {0,1,2,3}), balancing standard and robust performance.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.