QUICK REVIEW

[논문 리뷰] Hessian-based Analysis of Large Batch Training and Robustness to Adversaries

Zhewei Yao, Amir Gholami|arXiv (Cornell University)|2018. 02. 22.

Adversarial Robustness in Machine Learning참고 문헌 22인용 수 65

한 줄 요약

이 논문은 대배치 신경망 학습에 대한 헤시안 기반 분석을 수행하여, 큰 배치가 고-곡률 영역으로 수렴하고 적대적 교란에 더 취약해짐을 보이고, 강건한 최적화가 이를 보완하기 위해 평평한 최소점 쪽으로 편향한다.

ABSTRACT

Large batch size training of Neural Networks has been shown to incur accuracy loss when trained with the current methods. The exact underlying reasons for this are still not completely understood. Here, we study large batch size training through the lens of the Hessian operator and robust optimization. In particular, we perform a Hessian based study to analyze exactly how the landscape of the loss function changes when training with large batch size. We compute the true Hessian spectrum, without approximation, by back-propagating the second derivative. Extensive experiments on multiple networks show that saddle-points are not the cause for generalization gap of large batch size training, and the results consistently show that large batch converges to points with noticeably higher Hessian spectrum. Furthermore, we show that robust training allows one to favor flat areas, as points with large Hessian spectrum show poor robustness to adversarial perturbation. We further study this relationship, and provide empirical and theoretical proof that the inner loop for robust training is a saddle-free optimization problem extit{almost everywhere}. We present detailed experiments with five different network architectures, including a residual network, tested on MNIST, CIFAR-10, and CIFAR-100 datasets. We have open sourced our method which can be accessed at [1].

연구 동기 및 목표

실제 해시안 스펙트럼을 사용하여 대 배치 크기가 소 배치에 비해 손실 지형을 어떻게 변화시키는지 조사한다.
대 배치 학습과 적대적 교란에 대한 강건성 간의 관계를 검토한다.
강건한 최적화가 해시안 스펙트럼과 결정 경계에 미치는 영향을 탐구한다.

제안 방법

학습 중 2차 도함수를 역전파하여 실제 해시안 스펙트럼을 직접 계산한다.
소배치와 대배치에서의 해시안 스펙트럼과 교란 지형을 비교한다.
FGSM 및 2차 공격을 활용하여 다양한 아키텍처와 데이터셋에서 적대적 교란을 분석한다.
특정 조건 하에서 내부 강건 최적화가 거의 모든 지점에서 안장점이 없음을 시연한다.
경험적 및 이론적 분석을 사용하여 강건한 학습과 해시안 스펙트럼 변화의 관계를 제시한다.

실험 결과

연구 질문

RQ1대배치 학습이 소배치 학습에 비해 손실 지형의 국부 기하에 어떻게 영향을 미치는가?
RQ2배치 크기와 모델의 적대적 교란에 대한 강건성 사이의 연결은 무엇인가?
RQ3강건 최적화가 해시안 스펙트럼과 결정 경계에 미치는 영향을 탐구한다.
RQ4적대적 학습의 내부 루프가 거의 모든 지점에서 안장점이 없는 최적화 문제인가?

주요 결과

대배치 학습은 학습 손실과 테스트 손실 모두에서 헤시안 스펙트럼이 눈에 띄게 더 높은 영역으로 수렴한다.
대배치로 수렴한 지점은 소배치로 학습된 경우보다 적대적 공격에 더 취약하다.
강건 학습은 해시안 스펙트럼이 더 작은 영역으로 모델을 이동시켜 평평한 최소점에 편향된다는 것을 보인다.
제시된 가정하에서 내부적 적대적 교란 문제는 거의 모든 지점에서 안장점이 없는 최적화 문제이다.
강건 최적화는 적대적 강건성을 높이지만 깨끗한 데이터에 대한 정확도는 낮아질 수 있다.
적대적 학습은 해시안 스펙트럼을 바꿔 곡률이 낮은 모델을 만들 수 있으며, 전체 손실의 곡률이 양수로 남아 있을 때도 그렇다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.