QUICK REVIEW

[논문 리뷰] Provable defenses against adversarial examples via the convex outer adversarial polytope

Eric Wong, J. Zico Kolter|arXiv (Cornell University)|2017. 11. 02.

Adversarial Robustness in Machine Learning인용 수 712

한 줄 요약

본 논문은 노름-제한적 적대적扰perturbation에 대해 입증 가능한 강건성을 가지는 깊은 ReLU 분류기를 convex outer bound of the adversarial polytope를 최적화하여 학습하는 방법을 제시한다. 이 방법은 역망(dual network)을 통해 효율적인 학습이 가능하며 MNIST 및 다른 데이터셋에서 인증된 강건성을 달성하고, 다수의 작업에서 기존 경계보다 우수한 성능을 보여준다.

ABSTRACT

We propose a method to learn deep ReLU-based classifiers that are provably robust against norm-bounded adversarial perturbations on the training data. For previously unseen examples, the approach is guaranteed to detect all adversarial examples, though it may flag some non-adversarial examples as well. The basic idea is to consider a convex outer approximation of the set of activations reachable through a norm-bounded perturbation, and we develop a robust optimization procedure that minimizes the worst case loss over this outer region (via a linear program). Crucially, we show that the dual problem to this linear program can be represented itself as a deep network similar to the backpropagation network, leading to very efficient optimization approaches that produce guaranteed bounds on the robust loss. The end result is that by executing a few more forward and backward passes through a slightly modified version of the original network (though possibly with much larger batch sizes), we can learn a classifier that is provably robust to any norm-bounded adversarial attack. We illustrate the approach on a number of tasks to train classifiers with robust adversarial guarantees (e.g. for MNIST, we produce a convolutional classifier that provably has less than 5.8% test error for any adversarial attack with bounded $\ell_\infty$ norm less than $ε= 0.1$), and code for all experiments in the paper is available at https://github.com/locuslab/convex_adversarial.

연구 동기 및 목표

노이즈 perturbation에 대해 입증 가능한 강건성을 가진 분류기의 필요성을 동기화하고 정량화한다.
딥 ReLU 네트워크의 적대적 폴리토프에 대한 convex outer bound(볼록 완화)를 도입한다.
학습 중 강건한 손실 경계를 효율적으로 계산하기 위한 듀얼 네트워크 접근법을 개발한다.
본 학습 목표를 통해 unseen 데이터에 대해 입증 가능한 강건한 분류기와 공격 탐지를 제공한다.

제안 방법

k-layer ReLU 네트워크에 대해 adversarial polytope Z_epsilon(x)를 정의한다.
ReLU 제약을 볼록한 상단 엔벨로프(convex upper envelope)로 대체하여 tractable outer bound tilde{Z}_epsilon(x)를 형성한다.
결과 선형 계획법의 듀얼 형식을 도출하여 네트워크와 유사한 역전파를 얻고 J_epsilon(x, g_theta)를 제공하는 bound를 얻는다.
역전파 기반 알고리즘(Algorithm 1)을 활용하여 듀얼 구조를 이용해 활성화 bounds ell 및 u를 계산한다.
Theorem 2를 사용해 epsilon-볼에서 최악의 손실을 상한하는 robust loss로 학습한다(L(-J_epsilon(...), y)).
Corollaries 1 및 2를 통해 인증된 강건성 보장을 제공하고 결정 경계까지의 epsilon-거리(Eq. 17)를 계산한다.

실험 결과

연구 질문

RQ1노름-제한적 적대적 섭동에 대해 입증 가능한 강건성을 갖춘 심층 ReLU 네트워크를 학습할 수 있는가?
RQ2표준 역전파를 닮은 듀얼 형식을 통해 Tight한 robust loss bound를 효율적으로 계산할 수 있는가?
RQ3MNIST, Fashion-MNIST, HAR, SVHN에서의 실험적 강건성 보장이 비강건 baselines 및 다른 강건 방법에 비해 얼마나 큰가?

주요 결과

문제	강건성	ε	테스트 오차	FGSM 오차	PGD 오차	강건 오차 경계
MNIST	×	0.1	1.07%	50.01%	81.68%	100%
MNIST	√	0.1	1.80%	3.93%	4.11%	5.82%
Fashion-MNIST	×	0.1	9.36%	77.98%	81.85%	100%
Fashion-MNIST	√	0.1	21.73%	31.25%	31.63%	34.53%
HAR	×	0.05	4.95%	60.57%	63.82%	81.56%
HAR	√	0.05	7.80%	21.49%	21.52%	21.90%
SVHN	×	0.01	16.01%	62.21%	83.43%	100%
SVHN	√	0.01	20.38%	33.28%	33.74%	40.67%

MNIST에서 강건 모델은 epsilon=0.1에서 l_inf 섭동에 대해 5.82%의 강건 테스트 오차를 달성하는 반면, 비강건 모델은 100%의 강건 경계와 공격에 대한 실제 오차가 훨씬 크다.
강건 모델은 FGSM 및 PGD 오차를 표준 모델과 비교해 각각 3.93%와 4.11%로 크게 감소시킨다(표준 모델은 50.01% 및 81.68%).
데이터셋 전반에서 강건 경계가 PGD 기반 강건 성능보다 현저히 좁다(예: Fashion-MNIST 강건 오차 34.53% vs PGD 31.63%(동일 차수 내)).
이 방법은 합성곱 신경망과 중간 규모 문제로 확장 가능하며, 강건성이 보장되는 가장 큰 검증 가능한 네트워크를 달성한다(예: MNIST).
공격 탐지에 대해 거짓 음성(False Negative)이 전혀 없도록 보장: bound가 강건성을 인증하면 해당 예제는 epsilon 내에서 적대적이 될 수 없다.
듀얼 네트워크를 통해 단일 역전파로 강건한 bound를 효율적으로 계산할 수 있어 전통적 LP 솔버를 피한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.