QUICK REVIEW

[논문 리뷰] One-Shot Neural Architecture Search Through A Posteriori Distribution Guided Sampling.

Yizhou Zhou, Xiaoyan Sun|arXiv (Cornell University)|2019. 06. 23.

Domain Adaptation and Few-Shot Learning인용 수 3

한 줄 요약

이 논문은 아키텍처와 가중치에 대한 추정된 공동 a posteriori 분포에 따라 서브넷워크를 샘플링하는 방식으로 효율성과 정확도를 향상시키는 one-shot 신경망 아키텍처 탐색 방법을 제안한다. 이 분포를 변분 추론과 하이브리드 네트워크 표현을 통해 모델링함으로써, 서브넷워크 샘플링 횟수를 수개월 정도로 줄였으며, CIFAR-10, CIFAR-100, ImageNet에서 기존 방법보다 20배 빠른 검색 속도와 더 높은 정확도를 달성하여 최신 기준 수준의 성능을 확보한다.

ABSTRACT

The emergence of one-shot approaches has greatly advanced the research on neural architecture search (NAS). Recent approaches train an over-parameterized super-network (one-shot model) and then sample and evaluate a number of sub-networks, which inherit weights from the one-shot model. The overall searching cost is significantly reduced as training is avoided for sub-networks. However, the network sampling process is casually treated and the inherited weights from an independently trained super-network perform sub-optimally for sub-networks. In this paper, we propose a novel one-shot NAS scheme to address the above issues. The key innovation is to explicitly estimate the joint a posteriori distribution over network architecture and weights, and sample networks for evaluation according to it. This brings two benefits. First, network sampling under the guidance of a posteriori probability is more efficient than conventional random or uniform sampling. Second, the network architecture and its weights are sampled as a pair to alleviate the sub-optimal weights problem. Note that estimating the joint a posteriori distribution is not a trivial problem. By adopting variational methods and introducing a hybrid network representation, we convert the distribution approximation problem into an end-to-end neural network training problem which is neatly approached by variational dropout. As a result, the proposed method reduces the number of sampled sub-networks by orders of magnitude. We validate our method on the fundamental image classification task. Results on Cifar-10, Cifar-100 and ImageNet show that our method strikes the best trade-off between precision and speed among NAS methods. On Cifar-10, we speed up the searching process by 20x and achieve a higher precision than the best network found by existing NAS methods.

연구 동기 및 목표

랜덤 또는 균일한 서브넷워크 샘플링을 사용하는 전통적인 one-shot NAS 방법의 비효율성과 부적절한 성능 문제를 해결하기 위해.
초기화된 슈퍼넷에서 유도된 가중치가 개별 서브넷워크에 대해 최적화되지 않은 문제를 완화하기 위해.
학습된 후행 분포에 기반해 아키텍처와 가중치를 동시에 샘플링하는 방법을 개발하여 검색 효율성과 정확도를 향상시키기 위해.
최종 모델 성능을 유지하거나 향상시키면서도 서브넷워크 평가 횟수를 줄이기 위해.

제안 방법

이 방법은 변분 추론을 사용하여 신경망 아키텍처와 가중치에 대한 공동 a posteriori 분포를 추정한다.
공동 분포의 효과적인 파arameterization을 가능하게 하기 위해 하이브리드 네트워크 표현 방식을 도입한다.
변분 드롭아웃을 사용하여 분포 근사 문제를 종단 간(end-to-end) 학습 문제로 재구성한다.
추정된 후행 확률에 따라 서브넷워크를 샘플링함으로써 검색 중 아키텍처와 가중치가 함께 최적화되도록 보장한다.
전체 과정이 단일 종단 간 방식으로 학습되어 효율적이고 미분 가능한 검색이 가능하다.

실험 결과

연구 질문

RQ1아키텍처와 가중치에 대한 후행 분포가 one-shot NAS에서 샘플링 효율성을 향상시킬 수 있는가?
RQ2후행 분포 기반 샘플링은 랜덤 또는 균일한 샘플링 대비 검색 효율성과 정확도에서 어떻게 비교되는가?
RQ3아키텍처와 가중치를 동시에 샘플링하는 방식이 슈퍼넷에서 유도된 가중치의 부적절함 문제를 완화할 수 있는가?
RQ4검색 성능을 유지하거나 향상시키면서도 서브넷워크 평가 횟수를 얼마나 줄일 수 있는가?

주요 결과

제안된 방법은 기존 one-shot NAS 대비 서브넷워크 샘플 수를 수개월 정도 줄였다.
CIFAR-10에서 기존 NAS 방법이 찾은 최고 성능 네트워크보다 더 높은 top-1 정확도를 달성했다.
CIFAR-10에서 검색 과정이 20배 가까이 가속화되었으며, 뛰어난 성능을 유지했다.
CIFAR-10, CIFAR-100, ImageNet에서 검색 속도와 정확도 사이의 최신 기준 수준의 균형을 달성했다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.