QUICK REVIEW

[논문 리뷰] Structured Bayesian Pruning via Log-Normal Multiplicative Noise

Kirill Neklyudov, Dmitry Molchanov|arXiv (Cornell University)|2017. 05. 20.

Bayesian Methods and Mixture Models인용 수 68

한 줄 요약

구조화된 베이지안 가지치기(SBP)를 도입한 드롭아웃과 유사한 베이지안 레이어로, 레이어 출력에 로그-정규 곱 노이즈를 적용하고 SNR를 기반으로 가지치기를 수행하여 CNN과 FC nets의 가속화를 가능하게 하며, 큰 정확도 손실 없이 실용적인 가속화를 달성합니다.

ABSTRACT

Dropout-based regularization methods can be regarded as injecting random noise with pre-defined magnitude to different parts of the neural network during training. It was recently shown that Bayesian dropout procedure not only improves generalization but also leads to extremely sparse neural architectures by automatically setting the individual noise magnitude per weight. However, this sparsity can hardly be used for acceleration since it is unstructured. In the paper, we propose a new Bayesian model that takes into account the computational structure of neural networks and provides structured sparsity, e.g. removes neurons and/or convolutional channels in CNNs. To do this we inject noise to the neurons outputs while keeping the weights unregularized. We establish the probabilistic model with a proper truncated log-uniform prior over the noise and truncated log-normal variational approximation that ensures that the KL-term in the evidence lower bound is computed in closed-form. The model leads to structured sparsity by removing elements with a low SNR from the computation graph and provides significant acceleration on a number of deep neural architectures. The model is easy to implement as it can be formulated as a separate dropout-like layer.

연구 동기 및 목표

신경망에서 구조화된 sparsity를 얻는 베이지안 정규화 프레임워크를 개발한다.
추론 속도를 높이기 위해 전체 뉴런 또는 컨볼루션 채널의 제거를 가능하게 한다.
곱 노이즈에 대한 적절한 사전 분포를 가진 tractable 변분 추론 접근법을 제공한다.
MNIST 및 CIFAR-10으로 LeNet 및 VGG-유사 아키텍처에서 실용적인 가속을 시연한다.

제안 방법

뉴런 출력에 노이즈 변수 theta를 곱하는 드롭아웃과 유사한 SBP 레이어를 도입한다.
theta에 희소성을 유도하는 로그-유니폼(prior)을 두고, 그 포스트리에션을 잘려진 로그-정규 분포로 근사한다.
절단을 사용하여 proper probabilistic model을 보장하는 q(theta|mu, sigma)와 p(theta) 간의 닫힌 형식의 KL 발산을 도출한다.
mu, sigma)와 네트워크 가중치를 학습시키기 위해 재매개화(reparameterization)를 사용한 stochastic variational inference를 적용한다.
Bayesian 앙상블링 없이 단일 순전파를 수행하기 위해 테스트 시 기대값 E[theta]를 계산한다.
theta의 신호대잡음비(SNR)가 낮은 구성요소를 임계값으로 가지치기하여 낮은-SNR의 그룹(뉴런/필터)을 제거한다.
다차원 텐서(예: CNN의 채널)에 대해 theta를 그룹 간에 공유하여 SBP를 확장하고 구조화된 sparsity를 유도한다.

실험 결과

연구 질문

RQ1베이지안 드롭아웃을 신경망에서 구조화된 sparsity 패턴을 생성하도록 어떻게 적응시킬 수 있는가?
RQ2불완전한 로그-유니폼 사전을 사용할 때 tractable한 변분 목적함수를 도출할 수 있는가, 그리고 절단이 학습에 어떤 영향을 미치는가?
RQ3SBP가 표준 아키텍처와 데이터셋에서 전체 뉴런이나 채널을 제거하여 실용적인 가속을 달성하는가?
RQ4평균(mean)과 분산(sigma) 모두를 트레이닝하는 것이 평균만 고정했을 때보다 sparsity와 성능에 어떤 영향을 미치는가?

주요 결과

SBP는 높은 그룹 sparsity를 달성하여 CNN과 fully connected nets의 가속을 큰 정확도 저하 없이 가능하게 한다.
로그-정규 노이즈의 mu와 sigma를 함께 학습시키면 평균만 고정했을 때보다 더 빽빽한 변분 경계와 더 높은 sparsity를 얻는다.
낮은-SNR theta 구성요소를 기반으로 가지치기를 하면 전체 뉴런/필터를 효과적으로 제거하며 종종 정확도 손실이 없다.
MNIST 및 CIFAR-10에서 LeNet 및 VGG-유사 네트워크에 SBP를 적용하면 CPU, GPU 및 FLOPs 전반에서 실용적인 속도 향상이 나타나며 정확도도 경쟁력 있다.
제한된 로그-정규–로그-유니폼 prior-posterior 페어는 잘 정의된 tractable ELBO를 제공하고 비적절한 priors로 인한 문제를 피한다.
SBP 레이어는 최소한의 소프트웨어 수정으로 경량의 드롭아웃형 모듈로 삽입될 수 있다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.