QUICK REVIEW

[논문 리뷰] Network Pruning via Transformable Architecture Search

Xuanyi Dong, Yi Yang|arXiv (Cornell University)|2019. 05. 23.

Advanced Malware Detection Techniques참고 문헌 49인용 수 140

한 줄 요약

논문은 변환 가능한 아키텍처 검색(TAS)을 도입하여 Differentiable NAS를 통해 네트워크의 너비(width)와 깊이(depth)를 학습하고 가지치기를 수행하며, 다중 크기 특징 맵을 모으기 위해 채널-별 보간법을 사용하고 가지치된 아키텍처로 지식을 전이하는 KD를 적용합니다. CIFAR-10/100 및 ImageNet에서 실험은 전통적인 가지치기 방법보다 성능이 향상됨을 보여줍니다.

ABSTRACT

Network pruning reduces the computation costs of an over-parameterized network without performance damage. Prevailing pruning algorithms pre-define the width and depth of the pruned networks, and then transfer parameters from the unpruned network to pruned networks. To break the structure limitation of the pruned networks, we propose to apply neural architecture search to search directly for a network with flexible channel and layer sizes. The number of the channels/layers is learned by minimizing the loss of the pruned networks. The feature map of the pruned network is an aggregation of K feature map fragments (generated by K networks of different sizes), which are sampled based on the probability distribution.The loss can be back-propagated not only to the network weights, but also to the parameterized distribution to explicitly tune the size of the channels/layers. Specifically, we apply channel-wise interpolation to keep the feature map with different channel sizes aligned in the aggregation procedure. The maximum probability for the size in each distribution serves as the width and depth of the pruned network, whose parameters are learned by knowledge transfer, e.g., knowledge distillation, from the original networks. Experiments on CIFAR-10, CIFAR-100 and ImageNet demonstrate the effectiveness of our new perspective of network pruning compared to traditional network pruning algorithms. Various searching and knowledge transfer approaches are conducted to show the effectiveness of the two components. Code is at: https://github.com/D-X-Y/NAS-Projects.

연구 동기 및 목표

과도하게 파라미터화된 CNN에서 정확도를 희생하지 않으면서 계산량을 줄이기 위해 네트워크 가지치기를 추진한다.
NAS를 통해 하드코딩된 고정 구조에서 네트워크 크기 학습으로 가지치기를 전환한다.
비용 제약 하에서 너비와 깊이를 최적화하여 계산 예산을 준수한다.
가지치기되지 않은 네트워크에서 가지치기된 아키텍처로 지식을 전이하는(KD) 방법을 활용한다.
데이터셋(CIFAR-10/100, ImageNet)과 아키텍처 전반에 걸친 일반성을 보여준다.

제안 방법

per-layer 채널 수와 per-stage 깊이를 differentiable 아키텍처 매개변수로 검색하기 위해 TAS를 도입한다.
후보 채널 수와 층 수에 학습 가능한 분포를 부여하고 Backpropagation을 가능하게 하기 위해 Gumbel-Softmax를 이용해 최적화한다.
샘플링된 크기를 안내하는 가중 합계와 함께 채널-별 보간법(CWI)을 사용하여 다중 크기 특징 맵의 조각을 합산한다.
깊이에 대한 축적 합산으로 최종 출력을 계산하고 너비(alpha)와 깊이(beta) 매개변수 모두에 대해 역전파를 수행한다.
검증 손실에 계산 비용 항(term)을 사용하여 목표 FLOPs를 달성하도록 유도하는 구간형(cost) 함수를 도입한다.
Knowledge Distillation(KD)을 사용해 가지치기된 아키텍처의 성능을 향상시키기 위해 가지치지 않은 네트워크로부터 지식을 전이한다.

실험 결과

연구 질문

RQ1NAS를 이용해 가지치기용 네트워크 크기(너비와 깊이)를 직접 최적화할 수 있는가, 토폴로지(구조)만이 아니라?
RQ2비용 인식 objective를 통해 너비/깊이를 differentiable하게 샘플링하면 더 우수한 가지치기 아키텍처를 얻을 수 있는가?
RQ3가지치지 않은 모델로부터의 지식 전이가 가지치기된 네트워크의 성능에 유익한가?
RQ4샘플링 전략과 특징 맵 정렬(CWI)이 NAS의 효과에 어떤 영향을 미치는가?
RQ5TAS 유도 아키텍처는 CIFAR 및 ImageNet에서 전통적 가지치기 및 다른 NAS 기준 모델과 어떻게 비교되는가?

주요 결과

TAS와 KD는 CIFAR-10/100 및 ImageNet에서 전통적인 방법에 비해 가지치기 성능을 일관되게 향상시킨다.
너비와 깊이 모두를 검색하는 것이 하나만 검색하는 것보다 유사 FLOPs에서 더 나은 정확도를 낳는다.
가지치지 않은 네트워크에서의 지식 전이(KD)가 모든 실험에서 가지치기된 네트워크의 정확도를 높인다.
채널-별 보간법과 differentiable 아키텍처 매개변수를 사용하면 다중 크기 특징 맵의 정렬 및 집합화를 효과적으로 수행할 수 있다.
최신 가지치기 방법과 비교할 때 TAS는 여러 ResNet 변형 및 데이터셋에서 동일하거나 더 낮은 FLOPs에 더 높은 정확도를 달성한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.