QUICK REVIEW

[논문 리뷰] ZiCo: Zero-shot NAS via Inverse Coefficient of Variation on Gradients

Guihong Li, Yuedong Yang|arXiv (Cornell University)|2023. 01. 26.

Advanced Neural Network Applications인용 수 19

한 줄 요약

ZiCo는 샘플 간 그래디언트의 평균과 분산에 기반한 학습 없는 제로샷 NAS 프록시를 제안하여, 여러 NAS 벤치마크에서 #Params보다 테스트 정확도와의 상관관계가 일관되게 우수하고, 훨씬 적은 검색 시간으로 경쟁력 있는 결과를 가능하게 한다.

ABSTRACT

Neural Architecture Search (NAS) is widely used to automatically obtain the neural network with the best performance among a large number of candidate architectures. To reduce the search time, zero-shot NAS aims at designing training-free proxies that can predict the test performance of a given architecture. However, as shown recently, none of the zero-shot proxies proposed to date can actually work consistently better than a naive proxy, namely, the number of network parameters (#Params). To improve this state of affairs, as the main theoretical contribution, we first reveal how some specific gradient properties across different samples impact the convergence rate and generalization capacity of neural networks. Based on this theoretical analysis, we propose a new zero-shot proxy, ZiCo, the first proxy that works consistently better than #Params. We demonstrate that ZiCo works better than State-Of-The-Art (SOTA) proxies on several popular NAS-Benchmarks (NASBench101, NATSBench-SSS/TSS, TransNASBench-101) for multiple applications (e.g., image classification/reconstruction and pixel-level prediction). Finally, we demonstrate that the optimal architectures found via ZiCo are as competitive as the ones found by one-shot and multi-shot NAS methods, but with much less search time. For example, ZiCo-based NAS can find optimal architectures with 78.1%, 79.4%, and 80.4% test accuracy under inference budgets of 450M, 600M, and 1000M FLOPs, respectively, on ImageNet within 0.4 GPU days. Our code is available at https://github.com/SLDGroup/ZiCo.

연구 동기 및 목표

NAS에서 학습 없는 프록시의 필요성을 제시하고, 이전의 제로샷 프록시가 #Params와 비교할 때의 불일치를 다룬다.
샘플 간 그래디언트의 평균과 분산이 수렴성과 일반화에 미치는 이론적 영향을 분석한다.
ZiCo를 그래디언트 통계를 활용하여 주류 NAS 벤치마크에서 기존의 프록시보다 우수하게 만드는 제로샷 프록시로 개발한다.
검색 시간을 줄인 상태에서 NAS 벤치마크와 ImageNet 규모의 탐색에서 ZiCo의 효과를 입증한다.

제안 방법

저자들은 샘플 간 그래디언트의 평균과 표준편차가 학습 수렴성과 일반화에 어떻게 영향을 미치는지 분석하며, 선형 회귀 설정에서 시작해 ReLU-MLP로 확장한다.
그들은 샘플 간 그래디언트 평균의 절대값이 큰 경우 수렴 속도가 빨라지고, 그래디언트 분산이 작을수록 일반화가 향상된다고 증명하며 이를 Gram 행렬의 고유값과 연결한다.
ZiCo는 학습 없이 초기 매개변수에서 계층별로 기대 그래디언트 크기의 비를 그래디언트 크기의 표준편차로 로그 형식으로 합산한 제로샷 프록시로 정의되며, 두 배치(N=2)를 사용해 계산한다.
ZiCo 메트릭은 CNN에 대한 구조 독립적이며 초기 매개변수만 의존하여 제로샷 평가를 보장한다.
그들은 ZiCo가 NASBench101, NATS-Bench-SSS/TSS, TransNASBench-101에서 다른 제로샷 프록시 및 #Params보다 테스트 정확도와의 상관관계가 더 높음을 보여준다.

실험 결과

연구 질문

RQ1학습 샘플 간 그래디언트 평균과 분산이 NAS 성능에 대한 이론적으로 근거 있는 제로샷 프록시가 될 수 있는가?
RQ2그래디언트 통계에 기반한 제로샷 프록시가 다양한 NAS 토폴로지와 과제에서 직관적인 #Params 프록시를 일관되게 능가하는가?
RQ3ZiCo가 다양한 FLOPs 예산에서 비교적 경쟁력 있는 테스트 정확도를 가진 아키텍처를 최소한의 검색 비용으로 예측할 수 있는가?
RQ4ImageNet 같은 대규모 과제에서 ZiCo의 한-shot 및 다-shot NAS 방법 대비 성능은 어떠한가?

주요 결과

ZiCo는 NASBench101과 NATS-Bench 공간에서 다수의 데이터셋에 걸쳐 기존 프록시(포함 #Params)보다 테스트 정확도와의 상관관계가 더 높다.
ZiCo 기반 제로샷 NAS가 ImageNet에서 450M–1000M FLOPs 예산 하에, 최첨단 NAS 방법과 경쟁력 있는 Top-1 정확도를 달성하는 동시에 검색 비용이 현저히 낮다 (~0.4 GPU days).
두 개의 학습 배치로 ZiCo를 높은 신뢰도로 계산할 수 있어 후보 아키텍처의 빠른 평가가 가능하다.
ZiCo는 one-shot 및 multi-shot NAS에 비해 FLOPs-정합성에서 경쟁력 있는 아키텍처를 찾으면서도 훨씬 적은 학습 시간을 요구하는 NAS를 가능하게 한다.
경험적 제거 실험은 ZiCo 계산에서 배치 수를 늘려도 상관관계가 개선되지 않으며, 배치 크기 64가 지표를 안정화시킴을 시사한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.