QUICK REVIEW

[논문 리뷰] NAT: Neural Architecture Transformer for Accurate and Compact Architectures

Yong Guo, Yin Zheng|arXiv (Cornell University)|2019. 10. 31.

Advanced Neural Network Applications인용 수 73

한 줄 요약

NAT는 아키텍처 최적화를 MDP로 다루어 중복 연산을 더 효율적인 연산으로 대체하고, CIFAR-10 및 ImageNet에서 수작업으로 설계된 모델과 NAS 기반 모델의 더 정확하고 더 간결한 아키텍처를 얻는다.

ABSTRACT

Designing effective architectures is one of the key factors behind the success of deep neural networks. Existing deep architectures are either manually designed or automatically searched by some Neural Architecture Search (NAS) methods. However, even a well-searched architecture may still contain many non-significant or redundant modules or operations (e.g., convolution or pooling), which may not only incur substantial memory consumption and computation cost but also deteriorate the performance. Thus, it is necessary to optimize the operations inside an architecture to improve the performance without introducing extra computation cost. Unfortunately, such a constrained optimization problem is NP-hard. To make the problem feasible, we cast the optimization problem into a Markov decision process (MDP) and seek to learn a Neural Architecture Transformer (NAT) to replace the redundant operations with the more computationally efficient ones (e.g., skip connection or directly removing the connection). Based on MDP, we learn NAT by exploiting reinforcement learning to obtain the optimization policies w.r.t. different architectures. To verify the effectiveness of the proposed strategies, we apply NAT on both hand-crafted architectures and NAS based architectures. Extensive experiments on two benchmark datasets, i.e., CIFAR-10 and ImageNet, demonstrate that the transformed architecture by NAT significantly outperforms both its original form and those architectures optimized by existing methods.

연구 동기 및 목표

Performance를 향상시키고 계산을 줄이기 위해 아키텍처 내에서 의미 없거나 중복되는 모듈을 가지치기할 필요성에 대해 동기를 부여한다.
주어진 아키텍처를 계산 비용을 추가하지 않고 변환하는 일반적인 아키텍처 옵티마이저를 제안한다.
아키텍처 최적화를 MDP로 해석하고 skip 연결이나 null 에지로의 선택적 교체를 학습하는 정책을 학습한다.
인접 정보를 포착하고 연산 변환을 안내하기 위해 그래프 컨볼루션 네트워크를 활용한다.
CIFAR-10과 ImageNet 전반에 걸쳐 수작업 설계 및 NAS 기반 아키텍처 모두에서 효과를 입증한다.

제안 방법

N의(N), S의(S), 또는 O의(other operations)로 카테고리화된 간선이 있는 DAG로 아키텍처를 모델링하고 비용 순서를 c(O)>c(S)>c(N)을 정의한다.
최적화를 한 단계 마르코프 결정 프로세스(MDP)로 해석하고 β를 α로 변환하는 정책을 학습하여 비용을 보존하거나 감소시킨다.
Graph Convolutional Network(GCN)를 사용하여 정책을 매개변수화하고 간선 수준의 연산 결정에 로컬 그래프 구조를 캡처한다.
정책 그래디언트 및 엔트로피 정규화를 사용한 학습으로 탐색과 다양한 아키텍처 변환을 촉진한다.
단일 NAT를 여러 아키텍처에서 학습하기 위해 큰 공유 계산 그래프를 구축하여 파라미터 공유를 사용한다.
학습된 정책에서 여러 α를 샘플링하고 최상의 검증 정확도를 선택하여 최적화된 아키텍처를 추론한다.

실험 결과

연구 질문

RQ1NAT가 추가 계산 비용 없이 임의의 아키텍처를 더 정확하고/또는 더 간결한 형태로 안정적으로 변환할 수 있는가?
RQ2NAT가 수작업으로 설계된 네트워크(예: VGG, ResNet, MobileNet)와 NAS에서 파생된 모델(예: DARTS, ENAS, NAONet) 모두에 대해 일관된 개선을 제공하는가?
RQ3GCN 기반 정책이 아키텍처 변환에 대해 LSTM이나 무작위 탐색보다 우수한가?

주요 결과

NAT는 계산 비용이 비슷한 수작업 모델에서 일관되게 성능을 향상시키며, ImageNet에서 VGG 관련 사례에 대해 Top-1 정확도 2.75%까지 향상시킨다.
NAS 기반 모델에 대해 NAT는 파라미터를 약 20% 감소시키고 특정 기준에서 ImageNet의 Top-1 정확도를 0.6% 개선한다.
CIFAR-10과 ImageNet 전반에서 NAT 기반 아키텍처는 원래 모델과 NAO로 최적화된 베이스라인을 대부분의 케이스에서 능가한다.
샘플링 기반 정책(GCN)이 무작위 탐색, LSTM 및 Maximum-GCN보다 더 나은 검증 아키텍처를 생성한다.]
table_headers: []
table_rows: []}
table_headers: []
table_rows: []}
table_headers: []
table_rows: []}]}** This seems malformed due to extra brackets. Please provide proper JSON structure. The above includes errors. If you want, I can provide clean JSON. } } } )
title
GeneratedReview

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.