QUICK REVIEW

[논문 리뷰] N2N Learning: Network to Network Compression via Policy Gradient Reinforcement Learning

Anubhav Ashok, Nicholas Rhinehart|arXiv (Cornell University)|2017. 09. 18.

Advanced Neural Network Applications참고 문헌 35인용 수 116

한 줄 요약

이 논문은 정책 그래디언트와 지식 증류를 사용해 교사 네트워크를 자동으로 압축하여 높은 정확도와 경량 학생 네트워크를 얻는 두 단계 강화 학습 방법(레이어 제거 후 레이어 축소)을 제시한다.

ABSTRACT

While bigger and deeper neural network architectures continue to advance the state-of-the-art for many computer vision tasks, real-world adoption of these networks is impeded by hardware and speed constraints. Conventional model compression methods attempt to address this problem by modifying the architecture manually or using pre-defined heuristics. Since the space of all reduced architectures is very large, modifying the architecture of a deep neural network in this way is a difficult task. In this paper, we tackle this issue by introducing a principled method for learning reduced network architectures in a data-driven way using reinforcement learning. Our approach takes a larger `teacher' network as input and outputs a compressed `student' network derived from the `teacher' network. In the first stage of our method, a recurrent policy network aggressively removes layers from the large `teacher' model. In the second stage, another recurrent policy network carefully reduces the size of each remaining layer. The resulting network is then evaluated to obtain a reward -- a score based on the accuracy and compression of the network. Our approach uses this reward signal with policy gradients to train the policies to find a locally optimal student network. Our experiments show that we can achieve compression rates of more than 10x for models such as ResNet-34 while maintaining similar performance to the input `teacher' network. We also present a valuable transfer learning result which shows that policies which are pre-trained on smaller `teacher' networks can be used to rapidly speed up training on larger `teacher' networks.

연구 동기 및 목표

자동 데이터 기반 네트워크 압축 동기화 hardware 제약 충족.
교사 네트워크에서 파생된 컴팩트 아키텍처를 탐색하는 principled RL 프레임워크 개발.
아키텍처 공간을 효율적으로 탐색하기 위한 2단계 액션 스키마(레이어 제거 및 레이어 축소) 제시.
압축된 학생 모델 학습에 지식 증류 도입.
여러 데이터셋에서의 압축 효과 및 학습된 정책의 전달성 시연.

제안 방법

교사-학생 압축을 네트워크 아키텍처에 대한 마르코프 결정 프로세스로 형식화.
두 단계 정책 네트워크 사용: 레이어 제거 정책(이진 유지/제거)과 레이어 축소 정책(레이어 매개변수의 연속 감소).
보상 R = Rc × Ra를 사용해 정책을 REINFORCE 정책 그래디언트로 최적화.
Rc는 매개변수 수를 기반으로 한 비선형 압축 보상, Ra는 학생 대 교사 검증 정확도 비율.
Ax ≤ b의 점진적 패널티로 보상을 제한으로 하여 하드웨어 제약 반영.
교사 로짓을 사용한 지식 증류로 학생 네트워크 학습(학생 출력과 교사 로짓 간 L2 손실)으로 학습 가이드.

실험 결과

연구 질문

RQ1강화 학습이 더 큰 교사 네트워크에서 압축하면서도 정확성을 유지하는 컴팩트한 학생 아키텍처를 자동으로 발견할 수 있는가?
RQ22단계 액션 전략(레이어 제거 후 레이어 축소)이 현대 아키텍처와 데이터셋에 확장 가능한가?
RQ3학습된 압축 정책이 유사한 아키텍처 또는 더 큰 교사 간에 얼마나 잘 전달되는가?
RQ4보상을 하드웨어 제약에 효과적으로 통합하여 실제 구현 가능한 모델을 제공할 수 있는가?
RQ5교사로부터의 증류가 압축된 학생 네트워크의 성능을 향상시키는가?

주요 결과

ResNet-34 등 모델에서 상당한 압축(예: 최대 10x) 달성, 정확도는 교사에 근접.
2단계 정책 학습이 거시적(레이어 제거)과 미시적(레이어 축소) 결정 분리로 탐색 속도 증가.
더 작은 교사에서 학습된 정책이 더 큰 교사에 전달되어 새로운 설정에서 학습 속도 향상.
MNIST, CIFAR-10/100, SVHN, Caltech-256 등 여러 데이터셋에서 가지치기 및 수동 설계 지식 증류 벤치마크를 능가.
하드웨어 제약 보상으로 크기 제약이 있는 실용적인 모델이 도출되어 실용적 적용 가능성 입증.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.