QUICK REVIEW

[논문 리뷰] Pruning neural networks without any data by iteratively conserving synaptic flow

Hidenori Tanaka, Daniel Kunin|arXiv (Cornell University)|2020. 06. 09.

Advanced Memory and Neural Computing인용 수 256

한 줄 요약

이 논문은 학습이나 데이터 접근 없이도 레이어 붕괴를 피하고 최대 임계 압축을 달성하기 위해 시냅스 흐름을 보존하는 데이터 비의존적 가지치기 방법인 SynFlow를 제안한다.

ABSTRACT

Pruning the parameters of deep neural networks has generated intense interest due to potential savings in time, memory and energy both during training and at test time. Recent works have identified, through an expensive sequence of training and pruning cycles, the existence of winning lottery tickets or sparse trainable subnetworks at initialization. This raises a foundational question: can we identify highly sparse trainable subnetworks at initialization, without ever training, or indeed without ever looking at the data? We provide an affirmative answer to this question through theory driven algorithm design. We first mathematically formulate and experimentally verify a conservation law that explains why existing gradient-based pruning algorithms at initialization suffer from layer-collapse, the premature pruning of an entire layer rendering a network untrainable. This theory also elucidates how layer-collapse can be entirely avoided, motivating a novel pruning algorithm Iterative Synaptic Flow Pruning (SynFlow). This algorithm can be interpreted as preserving the total flow of synaptic strengths through the network at initialization subject to a sparsity constraint. Notably, this algorithm makes no reference to the training data and consistently competes with or outperforms existing state-of-the-art pruning algorithms at initialization over a range of models (VGG and ResNet), datasets (CIFAR-10/100 and Tiny ImageNet), and sparsity constraints (up to 99.99 percent). Thus our data-agnostic pruning algorithm challenges the existing paradigm that, at initialization, data must be used to quantify which synapses are important.

연구 동기 및 목표

초기화 시 가지치기가 왜 레이어 붕괴를 겪는지 동기를 부여하고 형식화한다.
일반적인 활성화 아래에서 시냅틱 중요도가 뉴런과 층 간에 보존된다는 것을 보인다.
큰 층이 평균 점수를 더 작게 받는 이유를 설명하고, 이로 인해 그래디언트 기반 방법에서의 붕괴가 발생한다.
데이터 비의존적 가지치기 알고리즘을 개발하여 Maximal Critical Compression을 달성한다.
학습 데이터를 사용하지 않고 SynFlow가 최첨단 가지치기와 동등하거나 이를 상회한다는 것을 실험으로 입증한다.

제안 방법

시냅틱 살리언시를 그래디언트와 매개변수의 Hadamard 곱으로 정의하고, 뉴런별 및 네트워크별 보존 법칙을 보여준다.
그래디언트 기반 점수들이 보존 특성을 보임을 증명하고, 층 크기 의존성을 설명한다.
데이터 비의존적 손실에 기초한 Iterative Synaptic Flow Pruning (SynFlow)을 도입하여 양의 보존 점수를 산출한다.
전역 마스킹하에서 반복적이면서 양의 보존 점수가 Maximal Critical Compression을 만족함을 증명한다.
SynFlow의 알고리즘 의사코드를 제공하고 계산 비용(100 가지치기 반복)을 논의한다.
여러 모델과 데이터셋에 걸쳐 SynFlow를 SNIP, GraSP 및 크기 기반/무작위 가지치기와 실증적으로 비교한다.

실험 결과

연구 질문

RQ1학습이나 데이터 없이 초기화 시에 highly sparse한 학습 가능한 서브네트워크를 식별할 수 있는가?
RQ2그래디언트 기반의 단일 샷 가지치기 방법이 왜 레이어 붕괴를 초래하는가, 그리고 이를 어떻게 완화할 수 있는가?
RQ3데이터 비의존적 가지치기 방법이 레이어 붕괴를 피하면서 Maximal Critical Compression에 도달할 수 있는가?
RQ4가지치기 점수의 반복 평가가 초기화 시 네트워크 학습 가능성의 보존에 어떤 영향을 미치는가?

주요 결과

SynFlow는 데이터를 사용하지 않고도 고압축 구간에서 지속적으로 기준선 및 데이터 의존 가지치기 방법을 능가한다.
초기화 시 시냅틱 살리언시의 보존 법칙이 뉴런별 및 네트워크별로 성립하여 왜 일부 점수에서 레이어 붕괴가 발생하는지 설명한다.
층 크기와 평균 층 점수 간의 역관계가 그래디언트 기반 방법으로 큰 층이 먼저 가지치기되는 이유를 설명한다.
반복적이고 양의 보존 점수를 가지는 점수 매김이 전역 마스킹 하에서 Maximal Critical Compression을 보장한다(레이어 붕괴 없음).
SynFlow는 데이터 비의존적 가지치기를 사용하여 12개 모델/데이터셋 조합에서 최첨단 가지치기 성능을 달성한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.