QUICK REVIEW

[논문 리뷰] Neural Architecture Search on ImageNet in Four GPU Hours: A Theoretically Inspired Perspective

Wuyang Chen, Xinyu Gong|arXiv (Cornell University)|2021. 02. 23.

Advanced Neural Network Applications참고 문헌 67인용 수 61

한 줄 요약

TE-NAS는 NTK 스펙트럼과 선형 영역의 수로 아키텍처를 순위화하고, 가지치기 기반 검색 전략을 더해 학습 없는 신경망 아키텍처 검색을 수행하며, 상당히 감소된 비용으로 경쟁력 있는 NAS 결과를 달성한다.

ABSTRACT

Neural Architecture Search (NAS) has been explosively studied to automate the discovery of top-performer neural networks. Current works require heavy training of supernet or intensive architecture evaluations, thus suffering from heavy resource consumption and often incurring search bias due to truncated training or approximations. Can we select the best neural architectures without involving any training and eliminate a drastic portion of the search cost? We provide an affirmative answer, by proposing a novel framework called training-free neural architecture search (TE-NAS). TE-NAS ranks architectures by analyzing the spectrum of the neural tangent kernel (NTK) and the number of linear regions in the input space. Both are motivated by recent theory advances in deep networks and can be computed without any training and any label. We show that: (1) these two measurements imply the trainability and expressivity of a neural network; (2) they strongly correlate with the network's test accuracy. Further on, we design a pruning-based NAS mechanism to achieve a more flexible and superior trade-off between the trainability and expressivity during the search. In NAS-Bench-201 and DARTS search spaces, TE-NAS completes high-quality search but only costs 0.5 and 4 GPU hours with one 1080Ti on CIFAR-10 and ImageNet, respectively. We hope our work inspires more attempts in bridging the theoretical findings of deep networks and practical impacts in real NAS applications. Code is available at: https://github.com/VITA-Group/TENAS.

연구 동기 및 목표

학습 요건을 없애고 학습 가능성과 표현력의 이론적 지표를 활용하여 NAS 비용을 줄이는 것을 목표로 한다.
테스트 정확도와 상관관계가 있는 학습 없이 측정 가능한 지표(NTK 스펙트럼과 선형 영역)를 식별한다.
학습 가능성과 표현력을 균형 있게 고려하면서 아키텍처를 효율적으로 탐색하기 위한 가지치기 기반 NAS 워크플로를 개발한다.
NAS-Bench-201, DARTS 공간의 CIFAR-10, 그리고 DARTS 공간의 ImageNet 전반에서 TE-NAS의 효과를 입증한다.

제안 방법

두 지표를 기반으로 한 학습 없는 NAS 프레임워크 TE-NAS를 제안한다: 학습 가능성을 반영하는 NTK 조건수(kappa_N)와 표현력을 반영하는 선형 영역의 수(R_N).
학습이나 라벨 없이 kappa_N과 R_N을 측정하고, 실험적으로 이들이 테스트 정확도와의 상관관계가 있음을 보여준다.
두 지표를 동일 가중의 상대 순위로 결합하여 아키텍처 선택을 안내한다.
중요도에 따른 가지치기 메커니즘을 도입하여 슈퍼 네트워크를 점진적으로 단일 경로 아키텍처로 축소하고 검색 속도를 높인다.
학습 없는 검색 비용으로 NAS-Bench-201과 DARTS 공간에서, CIFAR-10 및 ImageNet를 포함하여 TE-NAS를 검증한다.

실험 결과

연구 질문

RQ1NTK 스펙트럼과 선형 영역의 수와 같은 학습 없이 라벨 없이도 지표가 궁극적인 테스트 정확도에 따라 NAS 아키텍처의 순위를 효과적으로 매길 수 있는가?
RQ2가지치기 기반의 학습 없는 NAS 워크플로가 학습 기반 NAS 방법에 비해 비용의 일부만으로도 경쟁력 있는 아키텍처를 산출하는가?
RQ3학습 가능성(kappa_N)과 표현력(R_N)이 서로 다른 검색 공간에서 NAS의 연산자 선택에 어떤 영향을 미치는가?
RQ4TE-NAS를 CIFAR-10과 ImageNet 작업에 적용할 때의 실용적인 검색 시간 절감과 성능 트레이드오프는 무엇인가?

주요 결과

두 개의 학습 없이 측정 가능한 지표가 성능과 상관관계가 있다: 낮은 NTK 조건수 kappa_N(학습 가능성)와 큰 선형 영역 수 R_N(표현력)이 더 높은 테스트 정확도와 상관관계가 있다.
TE-NAS는 상당히 감소된 검색 시간으로 경쟁력 있는 NAS 결과를 달성한다: CIFAR-10에서 0.5 GPU-시간, ImageNet에서 4 GPU-시간은 하나의 1080Ti를 사용하여.
NAS-Bench-201에서 TE-NAS는 학습 없는 검색 하에 CIFAR-10, CIFAR-100 및 ImageNet-16-120에서 보고된 방법 중 최고 정확도를 달성했다(평균/표준편차가 보고됨).
DARTS 공간의 CIFAR-10에서 TE-NAS는 0.05 GPU-일의 검색 비용으로 2.63%의 테스트 오류를 달성했다(학습 없이).
모바일 설정의 ImageNet에서 DARTS 공간에서 TE-NAS는 0.17 GPU-일의 검색 비용으로 top-1 24.5%, top-5 7.5%를 달성했다(학습 없이).

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.