QUICK REVIEW

[논문 리뷰] ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware

Han Cai, Ligeng Zhu|arXiv (Cornell University)|2018. 12. 02.

Advanced Neural Network Applications참고 문헌 36인용 수 284

한 줄 요약

ProxylessNAS는 타깃 작업과 하드웨어에서 경로 이진화를 통해 메모리를 줄이며 직접적으로 신경망 아키텍처를 학습하여 지연 제약 하에서 CIFAR-10 및 ImageNet에서 최첨단 성능을 달성한다.

ABSTRACT

Neural architecture search (NAS) has a great impact by automatically designing effective neural network architectures. However, the prohibitive computational demand of conventional NAS algorithms (e.g. $10^4$ GPU hours) makes it difficult to \emph{directly} search the architectures on large-scale tasks (e.g. ImageNet). Differentiable NAS can reduce the cost of GPU hours via a continuous representation of network architecture but suffers from the high GPU memory consumption issue (grow linearly w.r.t. candidate set size). As a result, they need to utilize~\emph{proxy} tasks, such as training on a smaller dataset, or learning with only a few blocks, or training just for a few epochs. These architectures optimized on proxy tasks are not guaranteed to be optimal on the target task. In this paper, we present \emph{ProxylessNAS} that can \emph{directly} learn the architectures for large-scale target tasks and target hardware platforms. We address the high memory consumption issue of differentiable NAS and reduce the computational cost (GPU hours and GPU memory) to the same level of regular training while still allowing a large candidate set. Experiments on CIFAR-10 and ImageNet demonstrate the effectiveness of directness and specialization. On CIFAR-10, our model achieves 2.08\% test error with only 5.7M parameters, better than the previous state-of-the-art architecture AmoebaNet-B, while using 6$ imes$ fewer parameters. On ImageNet, our model achieves 3.1\% better top-1 accuracy than MobileNetV2, while being 1.2$ imes$ faster with measured GPU latency. We also apply ProxylessNAS to specialize neural architectures for hardware with direct hardware metrics (e.g. latency) and provide insights for efficient CNN architecture design.

연구 동기 및 목표

대규모 타깃(예: ImageNet)에서 프록시 태스크 없이 직접 CNN 아키텍처를 학습한다.
다양한 하드웨어 플랫폼(GPU, CPU, 모바일)에 걸친 아키텍처 검색을 가능하게 한다.
경로 단위 가지치기와 이진화를 통해 검색 효율을 일반 학습 수준으로 높인다.
블록 중복 제한을 제거하여 아키텍처 다양성을 확장한다.
효율적인 추론을 위한 하드웨어 인식을 반영한 아키텍처 인사이트를 제공한다.

제안 방법

각 혼합 연산에 대해 모든 후보 경로를 포함하는 과도하게 매개변수화된 네트워크를 구성한다.
아키텍처 파라미터를 이진화하여 런타임에 하나의 경로만 활성화하고, 메모리 사용량을 일반 학습 수준으로 감소시킨다.
가중치와 아키텍처 파라미터를 교대 업데이트로 학습하고, 가중치가 가장 낮은 경로를 가지치기로 컴팩트한 아키텍처를 도출한다.
하드웨어 지연(latency)을 연속적이고 미분 가능한 손실로 모델링(지연 규제)하여 정확도와 함께 지연을 최적화한다.
필요할 때 이진화된 경로를 학습하기 위한 REINFORCE 기반 대안을 제공한다.
비미분가능한 하드웨어 지표의 경우 지연 예측 모델을 사용하여 모바일 하드웨어에서 아키텍처 검색을 안내한다.

실험 결과

연구 질문

RQ1프록시 태스크 없이 대규모 작업(예: ImageNet)과 타깃 하드웨어에서 NAS를 직접 수행할 수 있는가?
RQ2경로 이진화가 대규모에서 메모리 효율적인 기울기 기반 NAS를 가능하게 하는가?
RQ3지연을 미분 가능 목표로 포함시켜 하드웨어 인식 아키텍처를 생산할 수 있는가?
RQ4타깃 하드웨어에서의 아키텍처 검색이 프록시 기반 방법에 비해 더 나은 정확도/지연 균형의 아키텍처를 산출하는가?
RQ5다양한 플랫폼(GPU, CPU, 모바일)에 맞춰 최적화할 때 나타나는 하드웨어 특화 아키텍처 패턴은 무엇인가?

주요 결과

모델	매개변수 수	테스트 오차 (%)
AmoebaNet-B + c/o	34.9M	2.13
Proxyless-R + c/o	5.8M	2.30
Proxyless-G + c/o	5.7M	2.08

CIFAR-10에서 ProxylessNAS는 5.7M 매개변수로 2.08% 테스트 오차를 달성하며 AmoebaNet-B를 능가하고 매개변수는 6배 더 적다.
ImageNet에서 Proxyless-G는 75.1% top-1 정확도를 달성( MobileNetV2보다 3.1% 높음)하고 측정된 GPU 지연에서 1.2배 빠르다.
모바일에서 Proxyless-G는 74.6% top-1 정확도에 78 ms 지연, 검색 비용은 200 GPU-시간으로 감소(MnasNet의 200배 적음).
하드웨어별로 특화된 아키텍처는 뚜렷한 특성을 보인다; GPU는 얕고 넓은 모델과 더 큰 MBConv 연산을 선호하는 반면, CPU는 더 깊고 좁은 모델을 선호한다.
지연 규제가 중요하다; 이를 사용하지 않으면 지연 최적화 모델이 정확도에서 낮아 하드웨어 인식 NAS의 필요성을 보여준다.
ProxylessNAS는 지연 제약 하에서 CIFAR-10 및 ImageNet에서 최첨단 결과를 보여주고 서로 다른 하드웨어에 대한 효율적인 CNN 설계 인사이트를 제시한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.