QUICK REVIEW

[论文解读] NAT: Neural Architecture Transformer for Accurate and Compact Architectures

Yong Guo, Yin Zheng|arXiv (Cornell University)|Oct 31, 2019

Advanced Neural Network Applications被引用 73

一句话总结

NAT 将架构优化视为一个 MDP，以替换冗余操作为高效操作，从而为 CIFAR-10 和 ImageNet 的手工设计和 NAS 基于模型获得更准确且更紧凑的架构。

ABSTRACT

Designing effective architectures is one of the key factors behind the success of deep neural networks. Existing deep architectures are either manually designed or automatically searched by some Neural Architecture Search (NAS) methods. However, even a well-searched architecture may still contain many non-significant or redundant modules or operations (e.g., convolution or pooling), which may not only incur substantial memory consumption and computation cost but also deteriorate the performance. Thus, it is necessary to optimize the operations inside an architecture to improve the performance without introducing extra computation cost. Unfortunately, such a constrained optimization problem is NP-hard. To make the problem feasible, we cast the optimization problem into a Markov decision process (MDP) and seek to learn a Neural Architecture Transformer (NAT) to replace the redundant operations with the more computationally efficient ones (e.g., skip connection or directly removing the connection). Based on MDP, we learn NAT by exploiting reinforcement learning to obtain the optimization policies w.r.t. different architectures. To verify the effectiveness of the proposed strategies, we apply NAT on both hand-crafted architectures and NAS based architectures. Extensive experiments on two benchmark datasets, i.e., CIFAR-10 and ImageNet, demonstrate that the transformed architecture by NAT significantly outperforms both its original form and those architectures optimized by existing methods.

研究动机与目标

阐明在架构中裁剪非显著或冗余模块以提升性能并减少计算的必要性。
提出一种通用的架构优化器，在不增加计算成本的前提下变换任意给定的架构。
将架构优化建模为 MDP，并学习一个策略，选择性地用跳跃连接或空边替换操作。
利用图卷积网络来捕捉邻接信息并指导操作转换。
在 CIFAR-10 和 ImageNet 上展示对手工设计和 NAS 基于架构的有效性。

提出的方法

将架构建模为有向无环图（DAG），边被归类为 N（空边）、S（跳过）或 O（其他操作），并定义成本排序为 c(O)>c(S)>c(N)。
将优化建模为一步的马尔可夫决策过程，并学习一个策略将 β 转换为 α，保留或降低成本。
使用图卷积网络（GCN）对策略进行参数化，并捕获边级操作决策的局部图结构。
通过策略梯度和熵正则化进行训练，以鼓励探索和多样化的架构转换。
通过构建一个大型共享计算图，使用参数共享在多种架构上训练一个 NAT。
通过从学习到的策略中抽样多个 α 并选择最佳的验证精度来推断优化后的架构。

实验结果

研究问题

RQ1NAT 是否能够在不增加额外计算成本的情况下，将任意架构可靠地转化为更准确和/或更紧凑的形式？
RQ2NAT 是否在手工设计的网络（如 VGG、ResNet、MobileNet）和 NAS 派生模型（如 DARTS、ENAS、NAONet）上都提供一致的改进？
RQ3使用基于 GCN 的策略是否在架构转换上优于 LSTM 或随机搜索？

主要发现

NAT 在手工设计的模型上持续取得改进，且计算成本相当，在 VGG 相关场景下在 ImageNet 上实现最高可达 2.75% 的 Top-1 精度提升。
对于 NAS 基于模型，NAT 将参数量约减少 20%，在某些基线上在 ImageNet 上带来 0.6% 的 Top-1 精度提升。
在 CIFAR-10 和 ImageNet 上，基于 NAT 的架构在大多数情况下优于原始架构，也优于 NAO 优化过的基线。
基于采样的策略（GCN）在产生更优的验证架构方面优于随机搜索、LSTM 和 Maximum-GCN。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。