QUICK REVIEW

[論文レビュー] Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning

Felipe Petroski Such, Vashisht Madhavan|arXiv (Cornell University)|Dec 18, 2017

Reinforcement Learning in Robotics参考文献 48被引用数 554

ひとこと要約

本論文は、勾配なしの単純な遺伝的アルゴリズムが、Atari と Humanoid のタスクにおいて、勾配ベース手法と並ぶ規模で深層ニューラルネットワークを強化学習に訓練できることを示しており、新規の符号化と探索技術を用いている。

ABSTRACT

Deep artificial neural networks (DNNs) are typically trained via gradient-based learning algorithms, namely backpropagation. Evolution strategies (ES) can rival backprop-based algorithms such as Q-learning and policy gradients on challenging deep reinforcement learning (RL) problems. However, ES can be considered a gradient-based algorithm because it performs stochastic gradient descent via an operation similar to a finite-difference approximation of the gradient. That raises the question of whether non-gradient-based evolutionary algorithms can work at DNN scales. Here we demonstrate they can: we evolve the weights of a DNN with a simple, gradient-free, population-based genetic algorithm (GA) and it performs well on hard deep RL problems, including Atari and humanoid locomotion. The Deep GA successfully evolves networks with over four million free parameters, the largest neural networks ever evolved with a traditional evolutionary algorithm. These results (1) expand our sense of the scale at which GAs can operate, (2) suggest intriguingly that in some cases following the gradient is not the best choice for optimizing performance, and (3) make immediately available the multitude of neuroevolution techniques that improve performance. We demonstrate the latter by showing that combining DNNs with novelty search, which encourages exploration on tasks with deceptive or sparse reward functions, can solve a high-dimensional problem on which reward-maximizing algorithms (e.g.\ DQN, A3C, ES, and the GA) fail. Additionally, the Deep GA is faster than ES, A3C, and DQN (it can train Atari in ${ aise.17ex\hbox{$\scriptstyle\sim$}}$4 hours on one desktop or ${ aise.17ex\hbox{$\scriptstyle\sim$}}$1 hour distributed on 720 cores), and enables a state-of-the-art, up to 10,000-fold compact encoding technique.

研究の動機と目的

平凡な遺伝的アルゴリズム（GA）が、スケールの大きい挑戦的な強化学習タスクに対して深層ニューラルネットワークを訓練できるか評価する。
Atari と MuJoCo Humanoid の機械的移動において、GA の性能を DQN、A3C、ES と比較する。
深層強化学習設定における novelty search や他のニューロエボリューション技術の利点を探る。
GA で進化した大規模ネットワークの効率的で圧縮された符号化を実証する。
シングルマシンおよび分散設定における GA の速度とスケーラビリティの利点を調査する。

提案手法

切り捨て選択とエリート主義を用いた単純な勾配なしの GA を用いてニューラルネットワークの重みを進化させる。
子個体を加法的ガウスノイズで突然変異させ、最良個体をエリートとして保持する；ノイズを低減するために複数エピソードで上位個体を評価する。
大規模な重みベクトルをシードベースの符号化で表現し、コンパクトでスケーラブルな分散トレーニングを可能にする。
deceptive task には、適応度を行動的新規性に置換して監視学習的に novelty search (GA-NS) を適用する。
2つの実験設定をテストする：ピクセルからの Atari（4M+ パラメータ以上のネットワーク）と MuJoCo の Humanoid Locomotion；DQN、ES、A3C と比較。
壁時計時間とスケーラビリティを評価するため、分散 CPU ベースおよび GPU 加速 GA の実装を含める。

実験結果

リサーチクエスチョン

RQ1単純な GA が Deep RL ベンチマークで用いられる規模（例：4M+ パラメータ）で、Atari と Humanoid タスクに対して効果的に深層ニューラルネットワークを訓練できるか？
RQ2これらの領域において、GA の性能は勾配ベース手法（DQN、A3C）および ES とどう比較されるか？
RQ3deceptive または高次元タスクにおいて、GAと組み合わせた novelty search は探索と性能を改善するか？
RQ4GA により進化した大規模ネットワークをコンパクトに符号化でき、効率的な分散トレーニングを実現できるか？
RQ5他の深層 RL 手法と比較した場合、GA の壁時計時間と計算コスト上の利点は何か？

主な発見

GA で訓練されたネットワークは、複数の Atari ゲームで DQN、A3C、ES と同等に機能し、いくつかのタイトル（例：Skiing、Frostbite、Venture）では優位である。
GA は 4M パラメータを超えるネットワークを進化させることができ、当時、従来の進化アルゴリズムで進化した中で最大級の神経網を意味する。
GA の実行は DQN や A3C よりも壁時計時間が大幅に速く、デスクトップ環境でのセットアップ（約4時間、4 GPUs/48 CPUs）や分散実行（約1時間、720 CPUs）を含む。
novelty search (GA-NS) は、報酬のみの GA や他のベースラインが解決できない高次元の画像ベースの迷路を解くことを可能にする。
ランダム探索は特定のゲームではいくつかの勾配ベース手法よりも優れることがあり、原点近傍の密な局所探索が一部の領域で強い解を生み出せることを示している。
GA に novelty exploration を組み合わせることは、深層ニューエボリューションにおける多様性と品質信号の統合の価値を示し、深層 RL とのハイブリッド手法の可能性を示唆している。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。