Skip to main content
QUICK REVIEW

[论文解读] Efficient Architecture Search by Network Transformation

Han Cai, Tianyao Chen|arXiv (Cornell University)|Jul 16, 2017
Advanced Neural Network Applications被引用 323
一句话总结

本论文介绍 Efficient Architecture Search (EAS),它使用由强化学习元控制器引导的函数保持网络转换来重用权重并高效探索 CNN 架构。

ABSTRACT

Techniques for automatically designing deep neural network architectures such as reinforcement learning based approaches have recently shown promising results. However, their success is based on vast computational resources (e.g. hundreds of GPUs), making them difficult to be widely used. A noticeable limitation is that they still design and train each network from scratch during the exploration of the architecture space, which is highly inefficient. In this paper, we propose a new framework toward efficient architecture search by exploring the architecture space based on the current network and reusing its weights. We employ a reinforcement learning agent as the meta-controller, whose action is to grow the network depth or layer width with function-preserving transformations. As such, the previously validated networks can be reused for further exploration, thus saves a large amount of computational cost. We apply our method to explore the architecture space of the plain convolutional neural networks (no skip-connections, branching etc.) on image benchmark datasets (CIFAR-10, SVHN) with restricted computational resources (5 GPUs). Our method can design highly competitive networks that outperform existing networks using the same design scheme. On CIFAR-10, our model without skip-connections achieves 4.23\% test error rate, exceeding a vast majority of modern architectures and approaching DenseNet. Furthermore, by applying our method to explore the DenseNet architecture space, we are able to achieve more accurate networks with fewer parameters.

研究动机与目标

  • 通过重用经过训练的网络来降低自动化架构设计的计算成本。
  • 提出一个框架(EAS),应用函数保持变换以扩展或增宽网络。
  • 利用强化学习代理来决定有用的变换操作。
  • 在 CIFAR-10 和 SVHN 上展示在有限 GPU 下的效率和竞争力。

提出的方法

  • 将模型架构搜索建模为一个序贯决策过程,其中状态是当前网络,动作是网络变换操作。
  • 将 Net2WiderNet 和 Net2DeeperNet 作为主要的函数保持变换,用以扩展或插入层同时保持功能。
  • 通过将变换适应到多输入路径,将 Net2Net 操作扩展到 DenseNet。
  • 使用双向 LSTM 编码器表示当前架构,并使用多个 actor 网络提出变换动作。
  • 使用 REINFORCE 训练 RL 元控制器,采用变换后的准确率奖励以及移动基线以降低方差。
  • 在资源受限条件下(5 个 GPU),在 CIFAR-10 和 SVHN 上对普通 CNN 空间和 DenseNet 空间进行实验。
Figure 1: Overview of the RL based meta-controller in EAS, which consists of an encoder network for encoding the architecture and multiple separate actor networks for taking network transformation actions.
Figure 1: Overview of the RL based meta-controller in EAS, which consists of an encoder network for encoding the architecture and multiple separate actor networks for taking network transformation actions.

实验结果

研究问题

  • RQ1在不从头重新训练的情况下,函数保持变换是否能高效地探索架构空间?
  • RQ2基于 RL 的元控制器在扩大或插入层以提升验证性能方面的学习能力有多强?
  • RQ3在资源有限的情况下,变换是否能在普通 CNN 与 DenseNet 风格结构间实现泛化?
  • RQ4与基线架构及先前的自动设计方法相比,在 CIFAR-10 与 SVHN 上的性能提升是什么?

主要发现

  • 在 5 GPUs 的条件下,EAS 发现了在 CIFAR-10 上经增强数据获得 4.23% 测试误差的有竞争力的普通 CNN。
  • 在 DenseNet 空间下,EAS 在 CIFAR-10 上实现 4.66% 测试误差,在 CIFAR-10+ 上为 3.44%,参数数量比一些基线更少。
  • 基于 RL 的元控制器在发现高性能结构方面优于随机搜索。
  • 在 SVHN 上,EAS 的最佳普通 CNN 经过训练后达到 1.73% 的测试误差,超过同空间下的许多自动设计模型。
  • EAS 通过重用权重并显著减少 GPU 数量,降低了计算负担,相比以往的大规模 NAS 方法。
  • 通过 EAS 的 DenseNet 探索在 CIFAR-10+ 上获得 3.44% 的准确率,参数量为 10.7M,超过若干 DenseNet 变体。
Figure 2: Net2Wider actor, which uses a shared sigmoid classifier to simultaneously determine whether to widen each layer based on its hidden state given by the encoder network.
Figure 2: Net2Wider actor, which uses a shared sigmoid classifier to simultaneously determine whether to widen each layer based on its hidden state given by the encoder network.

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。