QUICK REVIEW

[논문 리뷰] Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks

Torsten Hoefler, Dan Alistarh|arXiv (Cornell University)|2021. 01. 31.

Machine Learning and ELM인용 수 341

한 줄 요약

이 종합 검토는 딥 네트워크에서 가지치기(pruning)와 성장의 희소화(sparsification) 기술을 포괄적으로 검토하여 방법, 이론, 하드웨어 고려사항을 상세히 다루고 추론 및 학습의 효율성을 가능하게 한다.

ABSTRACT

The growing energy and performance costs of deep learning have driven the community to reduce the size of neural networks by selectively pruning components. Similarly to their biological counterparts, sparse networks generalize just as well, if not better than, the original dense networks. Sparsity can reduce the memory footprint of regular networks to fit mobile devices, as well as shorten training time for ever growing networks. In this paper, we survey prior work on sparsity in deep learning and provide an extensive tutorial of sparsification for both inference and training. We describe approaches to remove and add elements of neural networks, different training strategies to achieve model sparsity, and mechanisms to exploit sparsity in practice. Our work distills ideas from more than 300 research papers and provides guidance to practitioners who wish to utilize sparsity today, as well as to researchers whose goal is to push the frontier forward. We include the necessary background on mathematical methods in sparsification, describe phenomena such as early structure adaptation, the intricate relations between sparsity and the training process, and show techniques for achieving acceleration on real hardware. We also define a metric of pruned parameter efficiency that could serve as a baseline for comparison of different sparse networks. We close by speculating on how sparsity can improve future workloads and outline major open problems in the field.

연구 동기 및 목표

신경망에 대한 희소화 접근법을 조사하고 분류한다.
희소성의 수학적 기초와 실용적 학습 전략을 설명한다.
현 시점에서 희소성을 적용하는 실무자들을 위한 지침을 제공한다.
향후 연구를 이끌어갈 하드웨어 영향 및 남아 있는 문제들을 논의한다.

제안 방법

무엇이 가지치기되는지, 가지치기가 언제 발생하는지, 그리고 희소성이 어떻게 달성되는지에 따라 희소화를 분류한다.
SGD, Fisher/Hessian 관점, 및 Bayesian variational methods를 포함하여 희소성과 함께하는 학습의 결정론적 및 확률적 형식을 설명한다.
모델 용량을 유지하기 위해 연결을 다시 추가하는 메커니즘을 포함하여 학습 중의 가지치기-성장(pruning-as-growth)을 설명한다.
전체 합성곱 및 트랜스포머 아키텍처에서 희소성을 유도하는 실용적 기법을 개략한다.
희소 모델을 가속화하기 위한 소프트웨어 및 하드웨어 고려사항을 논의하고 평가 벤치마크를 제안한다.

실험 결과

연구 질문

RQ1깊은 신경망에서 가지치기와 성장을 위한 주요 희소화 기술은 무엇인가?
RQ2희소성이 학습 역학 및 일반화에 어떤 상호작용을 보이는가?
RQ3실제 하드웨어에서 추론 및 학습 시 희소성을 활용하는 효과적인 방법은 무엇인가?
RQ4희소한 네트워크를 비교하는 데 어떤 지표와 벤치마크를 사용해야 하는가?
RQ5희소 딥러닝을 발전시키는 데 남아 있는 미해결 문제는 무엇인가?

주요 결과

희소 방법은 정확도 손실이 거의 없거나 미미하게 남아도 모델 크기를 10-100배까지 줄일 수 있다.
희소화는 모바일 및 대규모 모델 모두에 적합한 메모리, 계산 및 에너지 절감을 제공하는 잠재력을 가진다.
가지치기 기법은 학습 역학과 연결되며 정규화 이점과 로버스트니스 효과를 제공할 수 있다.
변분적(Variational) 및 베이지안 관점은 희소성을 유도하고 측정하는 합리적 방법을 제공한다.
새로운 방법의 빠른 진전과 공정한 비교를 가능하게 하는 공통 벤치마크의 필요성이 있다.
희소 학습 및 성장 전략은 자원 사용을 줄이면서도 성능을 유지하거나 오히려 향상시킬 수 있다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.