QUICK REVIEW

[논문 리뷰] WoodFisher: Efficient Second-Order Approximation for Neural Network Compression

Sidak Pal Singh, Dan Alistarh|arXiv (Cornell University)|2020. 04. 29.

Advanced Neural Network Applications참고 문헌 48인용 수 36

한 줄 요약

WoodFisher는 경험적 Fisher와 Woodbury 항등식을 이용한 효율적인 역해시안 근사를 도입하여 2차 기반 가지치기를 가능하게 한다. 이는 ImageNet과 CIFAR10에서 CNN에 대해 최첨단 원샷 및 경쟁력 있는 점진적 가지치기를 제공한다.

ABSTRACT

Second-order information, in the form of Hessian- or Inverse-Hessian-vector products, is a fundamental tool for solving optimization problems. Recently, there has been significant interest in utilizing this information in the context of deep neural networks; however, relatively little is known about the quality of existing approximations in this context. Our work examines this question, identifies issues with existing approaches, and proposes a method called WoodFisher to compute a faithful and efficient estimate of the inverse Hessian. Our main application is to neural network compression, where we build on the classic Optimal Brain Damage/Surgeon framework. We demonstrate that WoodFisher significantly outperforms popular state-of-the-art methods for one-shot pruning. Further, even when iterative, gradual pruning is considered, our method results in a gain in test accuracy over the state-of-the-art approaches, for pruning popular neural networks (like ResNet-50, MobileNetV1) trained on standard image classification datasets such as ImageNet ILSVRC. We examine how our method can be extended to take into account first-order information, as well as illustrate its ability to automatically set layer-wise pruning thresholds and perform compression in the limited-data regime. The code is available at the following link, https://github.com/IST-DASLab/WoodFisher.

연구 동기 및 목표

신경망에 대해 2차 정보가 정확하고 확장 가능할 수 있는지의 타당성을 제시한다.
대형 모델에 적합한 역 해시안 정보를 추정하는 효율적인 방법을 개발한다.
최적 뇌 손상/수술자(Optimal Brain Damage/Surgeon) 프레임워크 내에서 신경망 압축에 이 방법을 적용한다.
최신 방법에 비해 원샷 및 점진적 가지치기 성능 개선을 입증한다.

제안 방법

해시안(Hessian)을 경험적 Fisher로 근사하고 Woodbury 항등식을 사용하여 역-피셔 추정치를 반복적으로 업데이트한다.
감쇠 항 λ I_d를 포함하여 复_hat_{n+1} = F_hat_n + 1/N grad(l_{n+1}) grad(l_{n+1})^T로 경험적 Fisher를 재귀적으로 업데이트한다.
Woodberry 업데이트를 통해 역을 계산: F_hat_{n+1}^{-1} = F_hat_n^{-1} - (F_hat_n^{-1} grad(l_{n+1}) grad(l_{n+1})^T F_hat_n^{-1}) / (N + grad(l_{n+1})^T F_hat_n^{-1} grad(l_{n+1})).
큰 모델에 확장하기 위한 블록 단위(청크형) 근사를 도입하여 블록 크기 c, 전체 매개변수 수 d일 때 실행 시간이 O(m c d) 이 되도록 한다.
제거를 위한 가지치기 통계량 以e_q = w_q^2 / (2 [H^{-1}]_{qq})를 정의하여 매개변수의 순위를 매기고, 이를 사용해 계층별 또는 전역 가지치기(joint vs independent WoodFisher)를 수행한다.
1차(그래디언트) 항을 포함하도록 확장하고, 제한된 데이터 상황에서의 가지치기와 자동 층별 희소성 임계치를 논의한다.

실험 결과

연구 질문

RQ1현대의 신경망에서 역해시안 정보(inverse-Hessian 정보를 통한 2차 근사)가 정확하고 확장 가능할 수 있는가?
RQ2대규모 가지치기 작업에서 경험적 Fisher가 해시안의 실용적이고 신뢰할 수 있는 대리변수(대리 지표)인가?
RQ3WoodFisher 기반 가지치기가 원샷 및 점진적 가지치기 설정에서 크기 기반 및 대각-Fisher 기저를 능가할 수 있는가?
RQ4공동(전역) 희소성 타깃팅이 계층별 가지치기보다 압축 성능을 향상시키는가?
RQ5WoodFisher를 제한된 데이터 시나리오에 확장하고 1차 정보를 도입해 완전 수렴 전에 가지치기를 수행할 수 있는가?

주요 결과

WoodFisher는 ResNet-20/CIFAR-10 및 ResNet-50/ImageNet에서 크기 기반 가지치기와 대각-Fisher 기초대응보다 원샷 가지치기 성능이 현저히 우수하다.
Joint WoodFisher(전역 희소성 타깃링)는 독립적(계층별) WoodFisher보다 대개 더 나은 성능을 보이며, 특히 더 높은 희소도에서 두드러진다.
청크 단위 블록 근사를 사용하면 가지치기 품질을 유지하면서도 실용적인 효율성을 유지할 수 있으며, 더 큰 블록 크기가 정확도를 향상시킨다.
점진적 가지치기 시나리오에서 WoodFisher는 최첨단 가지치기 방법들을 능가하며, 재학습과 함께 상위 방법과 어깨를 나란히 하거나 능가하는 경우가 있다.
실험적 증거는 WoodFisher로 구축된 지역 2차 근사 모델이 가지치기 방향을 따라 손실 변화을 밀접하게 예측함을 시사하며, 근사의 질을 뒷받침한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.