QUICK REVIEW

[논문 리뷰] Exploring Sparsity in Recurrent Neural Networks

Sharan Narang, Erich Elsen|arXiv (Cornell University)|2017. 04. 17.

Advanced Neural Network Applications참고 문헌 16인용 수 145

한 줄 요약

이 논문은 RNN 훈련 중 가중치를 점진적으로 0으로 만드는 가지치기(pruning) 기반 방법을 제시하여 매우 희소한 모델을 달성하고 정확도 유지 또는 향상 및 주목할 만한 속도 향상을 보인다.

ABSTRACT

Recurrent Neural Networks (RNN) are widely used to solve a variety of problems and as the quantity of data and the amount of available compute have increased, so have model sizes. The number of parameters in recent state-of-the-art networks makes them hard to deploy, especially on mobile phones and embedded devices. The challenge is due to both the size of the model and the time it takes to evaluate it. In order to deploy these RNNs efficiently, we propose a technique to reduce the parameters of a network by pruning weights during the initial training of the network. At the end of training, the parameters of the network are sparse while accuracy is still close to the original dense neural network. The network size is reduced by 8x and the time required to train the model remains constant. Additionally, we can prune a larger dense network to achieve better than baseline performance while still reducing the total number of parameters significantly. Pruning RNNs reduces the size of the model and can also help achieve significant inference time speed-up using sparse matrix multiply. Benchmarks show that using our technique model size can be reduced by 90% and speed-up is around 2x to 7x.

연구 동기 및 목표

모바일 및 임베디드 디바이스에서 배치를 가능하게 하기 위해 RNN 매개변수 수를 줄이는 것을 목표로 한다.
추가 재훈련 없이 희소 가중치 행렬을 얻는 in-training pruning 방법을 개발한다.
Pruning으로 모델 크기를 줄이면서 정확도를 유지하거나 향상시킬 수 있음을 입증한다.
희소 순환 계층에서의 추론 속도 향상을 수량화하고 배포 시사점을 논의한다.

제안 방법

가중치별로 마스크를 유지하고 단조 증가하는 가지치기 임계값을 사용한다.
학습 중 주기적으로 업데이트되는 임계값을 통해 임계값 아래의 매개변수를 0으로 하여 가중치를 가지치기한다.
소수의 하이퍼파라미터(start_itr, ramp_itr, end_itr, theta, phi, freq)에 의해 좌우되는 계층별 임계 함수 사용.
recurrent 및 linear 계층을 가지치기하되 편향(bias)이나 배치 정규화 파라미터는 가지치지 않는다.
점진적 가지치기를 하드 가지치기 및 더 큰 밀집 기준선과 비교하여 정확도 회복 여부를 평가한다.
Deep Speech 2 프레임워크 내의 GRU 및 vanilla RNN 아키텍처에 적용 가능성을 시연한다.

실험 결과

연구 질문

RQ1RNN 가중치의 in-training pruning으로 높은 희소성을 얻을 수 있고 정확도 손실은 최소화될 수 있는가?
RQ2점진적 가지치기가 최종 성능 및 매개변수 축소 측면에서 하드 가지치기와 비교하여 어떤 차이가 있는가?
RQ3희소 RNN의 실장(배치) 시 기억장치, 대역폭, 속도 등 실용적 배치 이점은 무엇인가?

주요 결과

가지치기 후 순환 및 선형 계층에서 약 88%에서 92% 정도의 희소성을 달성한다.
가지치기된 더 큰 모델(예: 2560–3072 은닉 유닛)을 밀집 기준선보다 우수하거나 근접하게 성능을 유지하면서도 훨씬 적은 매개변수를 사용한다.
점진적 가지치기는 동일한 매개변수 수에서 하드 가지치기에 비해 약 7%–9% 더 우수한 성능을 보인다.
Sparse RNNs는 상당한 메모리 압축 효과를 보인다(Deep Speech 2: 268 MB에서 약 32–64 MB로; GRU: 460 MB에서 약 50 MB로).
GEMM/SpMV 벤치마크는 높은 희소성에서 순환 계층에 대해 레이어 크기에 따라 GRU 또는 RNN을 사용할지 여부에 따라 3배에서 7배의 속도 향상을 보여준다.
Pruning은 일부 선행 방법들에 비해 훈련 시간을 줄이고 양자화와 결합될 때 디바이스 내 배치를 가능하게 할 수 있다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.