QUICK REVIEW

[논문 리뷰] Stabilizing the Lottery Ticket Hypothesis

Jonathan Frankle, Gintare Karolina Dziugaite|arXiv (Cornell University)|2019. 03. 05.

Advanced Neural Network Applications참고 문헌 34인용 수 146

한 줄 요약

이 논문은 초기화가 아닌 훈련 초기에 몇 % 지점으로의 rewind를 통해 프루닝하는 것이 매우 희박한 서브네트워크를 만들어 CIFAR-10과 ImageNet에서 원래 네트워크의 정확도와 같거나 더 높은 정확도를 달성할 수 있으며, 안정성을 핵심 설명으로 제시한다.

ABSTRACT

Pruning is a well-established technique for removing unnecessary structure from neural networks after training to improve the performance of inference. Several recent results have explored the possibility of pruning at initialization time to provide similar benefits during training. In particular, the "lottery ticket hypothesis" conjectures that typical neural networks contain small subnetworks that can train to similar accuracy in a commensurate number of steps. The evidence for this claim is that a procedure based on iterative magnitude pruning (IMP) reliably finds such subnetworks retroactively on small vision tasks. However, IMP fails on deeper networks, and proposed methods to prune before training or train pruned networks encounter similar scaling limitations. In this paper, we argue that these efforts have struggled on deeper networks because they have focused on pruning precisely at initialization. We modify IMP to search for subnetworks that could have been obtained by pruning early in training (0.1% to 7% through) rather than at iteration 0. With this change, it finds small subnetworks of deeper networks (e.g., 80% sparsity on Resnet-50) that can complete the training process to match the accuracy of the original network on more challenging tasks (e.g., ImageNet). In situations where IMP fails at iteration 0, the accuracy benefits of delaying pruning accrue rapidly over the earliest iterations of training. To explain these behaviors, we study subnetwork "stability," finding that - as accuracy improves in this fashion - IMP subnetworks train to parameters closer to those of the full network and do so with improved consistency in the face of gradient noise. These results offer new insights into the opportunity to prune large-scale networks early in training and the behaviors underlying the lottery ticket hypothesis

연구 동기 및 목표

깊은 네트워크에서 초기 프루닝이 실패하는 이유와 조기 훈련 중 프루닝이 학습 가능 서브네트워크를 생성할 수 있는지 조사한다.
조기 훈련 반복으로의 rewind가 서브네트워크 성능과 안정성에 미치는 영향을 평가한다.
프루닝과 데이터 순서에 대한 안정성을 로터리 티켓에 영향을 주는 메커니즘으로 도입하고 분석한다.

제안 방법

수정된 점: iterative magnitude pruning(IMP)을 초기화가 아닌 훈련 초반(k% 지점) 가중치로 rewind하도록 한다.
Lenet, Resnet-18, VGG-19를 사용해 CIFAR-10에서 rewind 여부에 따라 IMP를 평가하고 무작위 프루닝과 비교한다.
두 가지 형태의 안정성: 프루닝에 대한 안정성과 데이터 순서에 대한 안정성을 측정하며, 훈련 후 마스킹된 가중치 간의 L2 거리를 사용한다.
재훈련으로 대규모 ImageNet 모델(ResNet-50, Inception-v3, SqueezeNet)에 실험을 확장한다.
나중에 재 rewind가 안정성과 정확도를 어떻게 개선하는지 분석하고, 이것이 Lottery Ticket Hypothesis와 어떻게 관련되는지 다룬다.

실험 결과

연구 질문

RQ1초기화 시점의 IMP로 식별된 서브네트워크가 더 깊은 네트워크에서 비슷한 정확도로 학습될 수 있는가?
RQ2조기 훈련의 나중 단계에서의 프루닝이 더 작고 학습 가능한 서브네트워크를 만들어 원래 네트워크의 성능에 상응하거나 이를 능가하는가?
RQ3서브네트워크의 안정성(프루닝에 대한 안정성 및 데이터 순서에 대한 안정성)이 winning tickets를 찾는 예측인자인가?
RQ4ImageNet 같은 대규모 작업에서 rewind가 high-sparsity 서브네트워크에 어떤 영향을 미치는가?

주요 결과

IMP는 학습 속도 조정 없이 Resnet-18 및 VGG-19와 같은 더 깊은 네트워크에서 초기화 시점의 winning tickets를 찾지 못한다.
조기 훈련 반복으로 rewind(0.1%–7%)는 50%–99% 희소성의 서브네트워크를 가능하게 하여 CIFAR-10에서 전체 네트워크의 정확도와 일치할 수 있다.
ImageNet에서 4.4%, 3.5%, 6.6%로의 rewind는 각각 Resnet-50, Inception-v3, SqueezeNet에 대해 원래 정확도에 일치하는 70%, 70%, 50% 더 작은 서브네트워크를 만든다.
IMP로 찾아진 서브네트워크는 무작위 프루닝된 서브네트워크보다 프루닝과 데이터 순서에 훨씬 안정적이며, 이 안정성은 더 높은 정확도와 상관관계가 있다.
나중의 rewind 반복은 처음에 winning ticket을 내놓지 못한 서브네트워크의 안정성과 정확도를 일관되게 개선한다.
이러한 결과는 Rewinding이 포함된 수정된 Lottery Ticket Hypothesis를 시사하며, 초기화가 아니라 훈련 초기에 조기에 prune할 기회를 가리킨다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.