QUICK REVIEW

[论文解读] Neural Optimizer Search with Reinforcement Learning

Irwan Bello, Barret Zoph|arXiv (Cornell University)|Sep 21, 2017

Advanced Neural Network Applications参考文献 33被引用 201

一句话总结

本文通过训练一个 RNN 控制器从 DSL 生成更新方程来自动发现神经网络优化器，使用基于 PPO 的强化学习对它们进行优化，并展示在任务之间的可迁移性。

ABSTRACT

We present an approach to automate the process of discovering optimization methods, with a focus on deep learning architectures. We train a Recurrent Neural Network controller to generate a string in a domain specific language that describes a mathematical update equation based on a list of primitive functions, such as the gradient, running average of the gradient, etc. The controller is trained with Reinforcement Learning to maximize the performance of a model after a few epochs. On CIFAR-10, our method discovers several update rules that are better than many commonly used optimizers, such as Adam, RMSProp, or SGD with and without Momentum on a ConvNet model. We introduce two new optimizers, named PowerSign and AddSign, which we show transfer well and improve training on a variety of different tasks and architectures, including ImageNet classification and Google's neural machine translation system.

研究动机与目标

激励并自动化设计深度学习中的优化更新规则。
用领域特定语言表示更新规则，以实现灵活的组合。
利用强化学习根据验证性能对更新规则进行优化。
展示所发现的优化器在不同架构和任务之间的可迁移性。
提供比传统优化器更快且更省内存的替代方案。

提出的方法

训练一个循环神经网络控制器，使其输出描述优化器更新规则的 DSL 字符串。
使用 PPO（近端策略优化）最大化用所采样规则训练的目标模型在验证集上的准确度。
构建一个领域特定语言，编码操作数、一元函数和二元函数，以形成更新方程。
应用分布式训练设置以加速搜索，使用一个小型卷积神经网络进行评估，并以五轮训练作为信号。
识别并分析得到的更新规则，如 PowerSign 和 AddSign，以及名为线性余弦衰减的学习率衰减方案。

实验结果

研究问题

RQ1基于强化学习的控制器能否自动发现对神经网络有效的优化器更新规则？
RQ2所发现的更新规则是否能有效迁移到更大模型和不同任务？
RQ3搜索中会出现哪些新的更新规则和学习率调度，与 SGD、Momentum、RMSProp 和 Adam 相比如何？

主要发现

控制器发现的更新规则在 CIFAR-10 的一个小型 ConvNet 上可以超越带/不带 Momentum 的 Adam、RMSProp 和 SGD。
两大主要族簇，PowerSign 和 AddSign，成为有效的更新规则，并且能迁移到更大任务，如 CIFAR-10 的 Wide ResNet、ImageNet、GNMT 和 PTB 语言模型的训练，收益各异。
线性余弦衰减及其变体通常能实现更快的收敛并允许较大的初始学习率。
所发现的规则因为 PowerSign 为每个参数使用一个滑动平均而在内存占用方面可能比 Adam 更高效。
在将标准优化器替换为该方法后，在 ImageNet 的 top-1/top-5 和 GNMT BLEU 指标上取得可测量的提升。
学习到的规则在对超参数进行温和变化时仍然稳健，在某些设置下可以插值接近 SGD。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。