QUICK REVIEW

[논문 리뷰] Provably Robust Deep Learning via Adversarially Trained Smoothed Classifiers

Hadi Salman, Greg Yang|arXiv (Cornell University)|2019. 06. 09.

Adversarial Robustness in Machine Learning참고 문헌 39인용 수 131

한 줄 요약

이 논문은 새로운 SmoothAdv 공격을 통한 적대적 학습으로 무작위 스무딩의 증명 가능한 l2 강건성을 향상시키고 ImageNet과 CIFAR-10에서 최첨단 결과를 달성한다.

ABSTRACT

Recent works have shown the effectiveness of randomized smoothing as a scalable technique for building neural network-based classifiers that are provably robust to $\ell_2$-norm adversarial perturbations. In this paper, we employ adversarial training to improve the performance of randomized smoothing. We design an adapted attack for smoothed classifiers, and we show how this attack can be used in an adversarial training setting to boost the provable robustness of smoothed classifiers. We demonstrate through extensive experimentation that our method consistently outperforms all existing provably $\ell_2$-robust classifiers by a significant margin on ImageNet and CIFAR-10, establishing the state-of-the-art for provable $\ell_2$-defenses. Moreover, we find that pre-training and semi-supervised learning boost adversarially trained smoothed classifiers even further. Our code and trained models are available at http://github.com/Hadisalman/smoothing-adversarial .

연구 동기 및 목표

적대적 학습을 이용하여 smoothed classifier의 증명 가능한 l2-강건성을 향상시킨다.
smoothed classifier에 특화된 효과적인 공격(SmoothAdv)을 개발한다.
ImageNet와 CIFAR-10에서 실증적 및 인증 가능한 강건성 향상을 입증한다.
이 프레임워크에서 사전 학습 및 반지도 학습의 이점을 보여준다.

제안 방법

smoothed classifier를 위한 SmoothAdv 공격을 도입하고 projected gradient descent (PGD) 또는 decoupled direction and norm (DDN)으로 최적화한다.
smoothed soft classifier의 손실을 최대화하고 Gaussian-perturbed adversarial 예제에서 학습함으로써 적대적 학습을 공식화한다.
Gaussian 잡음의 Monte Carlo 샘플링으로 SmoothAdv 목적 함수의 기울기를 추정한다.
결과 모델에 대해 인증 가능한 l2-강건성 보장을 얻기 위해 랜덤화 스무딩을 활용한다.
강건성과 인증 가능한 정확도를 높이기 위해 사전 학습 및 반지도 학습을 도입한다.

실험 결과

연구 질문

RQ1smoothed classifier에 특화된 적대적 학습이 기존 방법을 넘어 인증 가능한 l2-강건성을 향상시킬 수 있는가?
RQ2smoothed classifier에서 adversarial 예제를 찾는 데 SmoothAdv 공격의 효과는 어느 정도인가?
RQ3학습 하이퍼파라미터(m_train, sigma, epsilon, T)가 인증 강건성에 미치는 영향은 무엇인가?
RQ4사전 학습 및 반지도 학습이 이 프레임워크에서 인증 가능한 강건성을 더 향상시키는가?

주요 결과

SmoothAdv로 학습된 smoothed classifier는 ImageNet와 CIFAR-10에서 다수의 반지름에 걸쳐 인증 가능한 정확도 측면에서 기존의 모든 증명 가능한 l2-강건 분류기보다 우수하다.
ImageNet에서 ResNet-50 smoothed classifier는 반경이 127/255 미만에서 56%의 증명 가능한 top-1 정확도를 달성하여 이전 49%를 개선한다.
CIFAR-10 smoothed classifier는 이전 연구보다 최대 16% 향상을 달성하며, 사전 학습과 반지도 학습을 결합하면 추가 이득(최대 22%)이 있다.
사전 학습과 반지도 학습은 이 프레임워크에서 인증 가능한 강건성을 일관되게 높인다.
공격 주도적 적대적 학습은 모델 최적화를 인증 목표와 일치시키고 일반적인 적대적 학습보다 더 높은 인증 강건성을 얻는다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.