QUICK REVIEW

[论文解读] Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning

Felipe Petroski Such, Vashisht Madhavan|arXiv (Cornell University)|Dec 18, 2017

Reinforcement Learning in Robotics参考文献 48被引用 554

一句话总结

论文表明一个简单的无梯度遗传算法可以在 Atari 和 Humanoid 任务上以可与基于梯度的方法相媲美的规模训练深度神经网络用于强化学习，具有新颖的编码和探索技术。

ABSTRACT

Deep artificial neural networks (DNNs) are typically trained via gradient-based learning algorithms, namely backpropagation. Evolution strategies (ES) can rival backprop-based algorithms such as Q-learning and policy gradients on challenging deep reinforcement learning (RL) problems. However, ES can be considered a gradient-based algorithm because it performs stochastic gradient descent via an operation similar to a finite-difference approximation of the gradient. That raises the question of whether non-gradient-based evolutionary algorithms can work at DNN scales. Here we demonstrate they can: we evolve the weights of a DNN with a simple, gradient-free, population-based genetic algorithm (GA) and it performs well on hard deep RL problems, including Atari and humanoid locomotion. The Deep GA successfully evolves networks with over four million free parameters, the largest neural networks ever evolved with a traditional evolutionary algorithm. These results (1) expand our sense of the scale at which GAs can operate, (2) suggest intriguingly that in some cases following the gradient is not the best choice for optimizing performance, and (3) make immediately available the multitude of neuroevolution techniques that improve performance. We demonstrate the latter by showing that combining DNNs with novelty search, which encourages exploration on tasks with deceptive or sparse reward functions, can solve a high-dimensional problem on which reward-maximizing algorithms (e.g.\ DQN, A3C, ES, and the GA) fail. Additionally, the Deep GA is faster than ES, A3C, and DQN (it can train Atari in ${\raise.17ex\hbox{$\scriptstyle\sim$}}$4 hours on one desktop or ${\raise.17ex\hbox{$\scriptstyle\sim$}}$1 hour distributed on 720 cores), and enables a state-of-the-art, up to 10,000-fold compact encoding technique.

研究动机与目标

评估一个简单遗传算法（GA）是否能够在大规模、具挑战性的 RL 任务上训练深度神经网络。
在 Atari 和 MuJoCo Humanoid 运动任务中，将 GA 的表现与 DQN、A3C 和 ES 进行比较。
探索新颖性搜索及其他神经进化技术在深度 RL 设置中的收益。
演示通过 GA 进化得到的大型网络的高效、压缩编码。
研究 GA 在单机和分布式设置中的速度与可扩展性优势。

提出的方法

使用一个简单、无梯度的 GA，采用截断选择和精英主义来进化神经网络权重。
用加性高斯噪声对后代进行变异；将最佳个体保留为精英；对顶级个体进行多剧集评估以降低噪声。
通过种子编码表示大型权重向量，以实现紧凑、可扩展的分布式训练。
通过在具有欺骗性任务的监督意义上用行为新颖性替代适应度，应用新颖性搜索（GA-NS）。
测试两种实验设置：从像素输入的 Atari（4M+ 参数网络）和 MuJoCo 的 Humanoid Locomotion；与 DQN、ES 和 A3C 进行比较。
包括一个分布式 CPU 基础和基于 GPU 加速的 GA 实现，以评估墙钟时间和可扩展性。

实验结果

研究问题

RQ1一个简单的 GA 是否能够在像深度 RL 基准所用的规模（例如 4M+ 参数）上有效地在 Atari 和 Humanoid 任务中训练深度神经网络？
RQ2在这些领域中，GA 的表现与基于梯度的方法（DQN、A3C）和 ES 相比如何？
RQ3当与 GA 结合时，新颖性搜索是否在具有欺骗性或高维任务中改善探索和性能？
RQ4通过 GA 进化的大型网络是否可以紧凑编码，从而实现高效的分布式训练？
RQ5相较于其他深度 RL 方法，GA 的墙钟时间和计算成本具有哪些优势？

主要发现

GA 训练的网络在多款 Atari 游戏中的表现与 DQN、A3C 和 ES 相当，在某些标题（如 Skiing、Frostbite、Venture）上具有优势。
GA 可以进化出拥有超过四百万参数的网络，代表了当时用传统进化算法进化的最大神经网络之一。
GA 运行在墙钟时间方面显著快于 DQN 和 A3C，包括桌面设置（~4 小时，4 GPU/48 CPU）和分布式运行（~1 小时，720 CPU）。
新颖性搜索（GA-NS）能够解决高维的基于图像的迷宫任务，而仅靠奖励的 GA 和其他基线方法无法解决。
随机搜索在某些游戏中往往优于某些梯度基方法，突显在原点附近的密集局部搜索在某些领域能产生强解。
带有新颖性探索的 GA 展示了在深度神经进化中整合多样性和质量信号的价值，提示与深度 RL 的潜在混合方法。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。