QUICK REVIEW

[論文レビュー] Hyperparameter Tuning for Deep Reinforcement Learning Applications

Mariam Kiran, Melis Ozyildirim|arXiv (Cornell University)|Jan 26, 2022

Machine Learning and Data Classification被引用数 20

ひとこと要約

この論文は HPS-RL を紹介します。分散型の変長遺伝的アルゴリズムフレームワークで、複数のアルゴリズムと Gym 環境にわたり深層強化学習のハイパーパラメータを自動的かつ多目的に調整し、より早い訓練とより堅牢な展開を目指します。

ABSTRACT

Reinforcement learning (RL) applications, where an agent can simply learn optimal behaviors by interacting with the environment, are quickly gaining tremendous success in a wide variety of applications from controlling simple pendulums to complex data centers. However, setting the right hyperparameters can have a huge impact on the deployed solution performance and reliability in the inference models, produced via RL, used for decision-making. Hyperparameter search itself is a laborious process that requires many iterations and computationally expensive to find the best settings that produce the best neural network architectures. In comparison to other neural network architectures, deep RL has not witnessed much hyperparameter tuning, due to its algorithm complexity and simulation platforms needed. In this paper, we propose a distributed variable-length genetic algorithm framework to systematically tune hyperparameters for various RL applications, improving training time and robustness of the architecture, via evolution. We demonstrate the scalability of our approach on many RL problems (from simple gyms to complex applications) and compared with Bayesian approach. Our results show that with more generations, optimal solutions that require fewer training episodes and are computationally cheap while being more robust for deployment. Our results are imperative to advance deep reinforcement learning controllers for real-world problems.

研究の動機と目的

深層強化学習における系統的なハイパーパラメータ調整の必要性と、それが性能と堅牢性に及ぼす影響を動機づける。
さまざまな深層RLアルゴリズムのハイパーパラメータを自動的に探索する、拡張性のある多目的 GA ベースのフレームワーク（HPS-RL）を提案する。
複数の RL タスクとハードウェア構成でアプローチの拡張性と効率を実証する。
研究者が深層 RL のハイパーパラメータ最適化を探索できるよう、オープンソース実装を提供する。

提案手法

GA の集団内でハイパーパラメータを遺伝子として表現する。
世代を超えてハイパーパラメータを進化させるために交叉と突然変異を用いる。
累積報酬、訓練時間、損失を測定し、限られたエピソード数でエージェントを訓練して適合度を評価する。
適合度に基づいて親を選ぶためにルーレットホイール選択を適用する。
異なる RL アルゴリズム（例: DDPG と ACKTR）に適応するために可変長の遺伝子ブロックをサポートする。
探索を高速化するために分散計算（ヘッドノード、パラメータサーバ、複数のワーカー）を活用する。

実験結果

リサーチクエスチョン

RQ1多様な深層 RL アルゴリズムに対して、遺伝的アルゴリズムは効果的に多目的ハイパーパラメータ最適化を実行できるのか？
RQ2HPS-RL はより少ない訓練エピソードでより高い報酬と、Gym 環境全体でのより大きな堅牢性をもたらすハイパーパラメータを見つけるのか？
RQ3深層 RL のハイパーパラメータに関して、効率性とスケーラビリティの観点から GA ベースの調整はベイズ最適化とどう比較されるのか？

主な発見

GA ベースの多目的探索は、世代を重ねるごとにより少ないエピソードでより高い報酬を達成するハイパーパラメータへと進化させることができる。
より多くの世代（実験では最大 50）により、Cartpole、Lunar Landing、および Autonomous Laser 環境で適合度が向上し、訓練要件が削減される。
HPS-RL はヘッドノード、パラメータサーバ、ワーカーを含む分散アーキテクチャでマルチコアCPUおよびGPU上でスケーラビリティを示す。
ベイズ最適化と比較して、GA アプローチは並列性を活用でき、RL の早期段階のランダム性をより効果的に探索できる可能性があり、限られた計算資源下で有利となる。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。