QUICK REVIEW

[논문 리뷰] Adversarial Robustness through Local Linearization

Chongli Qin, James Martens|arXiv (Cornell University)|2019. 07. 04.

Adversarial Robustness in Machine Learning참고 문헌 26인용 수 91

한 줄 요약

로컬 선형성 규제자(LLR)를 도입하여 학습 데이터 근처에서 손실의 선형성 동작을 촉진하고, CIFAR-10 및 ImageNet에서 표준 적대적 학습에 비해 더 빠른 강건 학습과 향상된 적대적 정확도를 가능하게 한다.

ABSTRACT

Adversarial training is an effective methodology for training deep neural networks that are robust against adversarial, norm-bounded perturbations. However, the computational cost of adversarial training grows prohibitively as the size of the model and number of input dimensions increase. Further, training against less expensive and therefore weaker adversaries produces models that are robust against weak attacks but break down under attacks that are stronger. This is often attributed to the phenomenon of gradient obfuscation; such models have a highly non-linear loss surface in the vicinity of training examples, making it hard for gradient-based attacks to succeed even though adversarial examples still exist. In this work, we introduce a novel regularizer that encourages the loss to behave linearly in the vicinity of the training data, thereby penalizing gradient obfuscation while encouraging robustness. We show via extensive experiments on CIFAR-10 and ImageNet, that models trained with our regularizer avoid gradient obfuscation and can be trained significantly faster than adversarial training. Using this regularizer, we exceed current state of the art and achieve 47% adversarial accuracy for ImageNet with l-infinity adversarial perturbations of radius 4/255 under an untargeted, strong, white-box attack. Additionally, we match state of the art results for CIFAR-10 at 8/255.

연구 동기 및 목표

적대적 학습의 높은 계산 비용이 강건한 모델에 미치는 문제를 동기화하고 해결한다.
손실의 로컬 선형성을 학습 데이터 주위에서 강제하는 규제자를 제안하여 그래디언트 난기능(obfuscation)을 방지한다.
Local Linearity Regularization (LLR)이 더 빠른 학습 속도와 강력한 공격에 대해 더 좋거나 동등한 강건성을 제공함을 보여준다.
LLR을 CIFAR-10과 ImageNet에서 강력한 화이트박스 적대자와 대조적으로 평가하고 ADV, TRADES, CURE와 같은 baselines와 비교한다.”],
method([
Define a local linearity measure gamma(epsilon, x) capturing the deviation from a first-order Taylor expansion within an epsilon-ball.
Derive the Local Linearity Regularizer (LLR) that penalizes gamma(epsilon, x) and the inner perturbation term |delta_LL R^T grad_x ell(x), constrained to the epsilon-ball.
Use an inner optimization to find delta_LL R via gradient descent, similar in spirit to adversarial training but typically with far fewer steps.
Provide a combined objective L(D) = E[ ell(x) + lambda*gamma(epsilon, x) + mu*|delta_LL R^T grad ell(x)| ] to train robust models.
Argue and empirically show that minimizing gamma(epsilon, x) suffices to bound adversarial loss and reduces gradient obfuscation.

제안 방법

로컬 선형성 측정치 gamma(epsilon, x)를 정의하여 epsilon-공 안에서 일차 테일러 확장의 편차를 캡처한다.
epsilon-공에 제한된 gamma(epsilon, x)와 내부 섭 perturbation 항 |delta_LL R^T grad_x ell(x)|를 벌하는 Local Linearity Regularizer (LLR)를 도출한다.
그라디언트 디센트를 통해 delta_LL R을 찾는 내부 최적화를 활용하며, 적대적 학습과 비슷한 맥락이지만 일반적으로 훨씬 적은 단계로 수행된다.
강건한 모델 학습을 위한 결합 목적 함수 L(D) = E[ ell(x) + lambda*gamma(epsilon, x) + mu*|delta_LL R^T grad ell(x)| ]를 제공한다.
gamma(epsilon, x)를 최소화하는 것이 적대적 손실을bound하고 그래디언트 obfuscation을 감소시킨다는 것을 주장하고 경험적으로 보여준다.

실험 결과

연구 질문

RQ1학습 예제 주위에서 손실의 로컬 선형성을 강제하는 것이 그래디언트 obfuscation을 줄이고 강력한 적대자에 대한 강건성을 향상시킬 수 있는가?
RQ2LLR이 표준 적대적 학습보다 학습 속도가 빠르면서 그 강건성에 도달하거나 이를 능가하는가?
RQ3강력한 무표적 및 표적 화이트박스 공격 하에서 CIFAR-10과 ImageNet에서 LLR의 성능은 ADV, TRADES, DENOISE와 어떻게 비교되는가?
RQ4공격자가 변화의 강도를 증가시킬 때 LLR이 강건성 저하에 미치는 영향은 어떤가?

주요 결과

LLR은 강력한 화이트박스 공격 하에서 epsilon=8/255 및 ImageNet에서 epsilon=4/255에 대해 CIFAR-10의 최첨단 적대적 정확도를 달성한다.
LLR로의 학습은 표준 적대적 학습에 비해 ImageNet에서 최대 5배 빠른 것으로 보고된다.
LLR로 학습된 모델은 적대적 정확도가 공격 강도가 증가함에 따라 더 완만하게 감소하며, 이는 적대적 학습으로 학습된 모델보다 우수한 경향을 보인다.
ImageNet에서 LLR은 untargeted 공격 하에서 epsilon=4/255에 대해 47%의 적대적 정확도를 달성하여 여러 baselines보다 우수하다.
CIFAR-10의 경우 LLR은 epsilon=8/255에서 52.81%의 적대적 정확도를 달성하며 유사한 평가에서 보고된 baselines를 매칭하거나 능가한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.