QUICK REVIEW

[論文レビュー] Metatrace: Online Step-size Tuning by Meta-gradient Descent for Reinforcement Learning Control.

Kenny Young, Baoxiang Wang|arXiv (Cornell University)|May 10, 2018

Reinforcement Learning in Robotics参考文献 9被引用数 5

ひとこと要約

Metatraceは、非定常環境における学習を安定化させるために、選択的トレースを用いたメタ勾配降下を用いて強化学習制御におけるオンラインステップサイズチューニングを提案する。線形および非線形関数近似の両設定において、初期ハイパーパramータへのロバスト性と学習速度の向上を実現し、特に非定常性が顕著な状況で顕著な効果を示す。

ABSTRACT

Reinforcement learning (RL) has had many successes in both deep and shallow settings. In both cases, significant hyperparameter tuning is often required to achieve good performance. Furthermore, when nonlinear function approximation is used, non-stationarity in the state representation can lead to learning instability. A variety of techniques exist to combat this --- most notably large experience replay buffers or the use of multiple parallel actors. These techniques come at the cost of moving away from the online RL problem as it is traditionally formulated (i.e., a single agent learning online without maintaining a large database of training examples). Meta-learning can potentially help with both these issues by tuning hyperparameters online and allowing the algorithm to more robustly adjust to non-stationarity in a problem. This paper applies meta-gradient descent to derive a set of step-size tuning algorithms specifically for online RL control with eligibility traces. Our novel technique, Metatrace, makes use of an eligibility trace analogous to methods like $TD(\lambda)$. We explore tuning both a single scalar step-size and a separate step-size for each learned parameter. We evaluate Metatrace first for control with linear function approximation in the classic mountain car problem and then in a noisy, non-stationary version. Finally, we apply Metatrace for control with nonlinear function approximation in 5 games in the Arcade Learning Environment where we explore how it impacts learning speed and robustness to initial step-size choice. Results show that the meta-step-size parameter of Metatrace is easy to set, Metatrace can speed learning, and Metatrace can allow an RL algorithm to deal with non-stationarity in the learning task.

研究の動機と目的

オンライン強化学習におけるハイパーパramータ感受性、特にステップサイズ選択の課題に取り組む。
訓練中に状態表現が変化する非定常環境における学習安定性を向上させる。
大規模な経験リプレイバッファや並列エージェントに依存せずに、オンラインで適応的なステップサイズ調整を可能にする。
選択的トレースを用いて訓練中に動的にステップサイズをチューニングするメタラーニングベースの手法を開発する。
線形および非線形関数近似の両設定において、ロバスト性と効率性の向上を実証する。

提案手法

強化学習アルゴリズムの学習ダイナミクスを微分することで、最適なステップサイズを学習するためのメタ勾配降下を適用する。
選択的トレースを用いて信用配分を伝搬させることで、軌道全体の期待報酬を最小化するメタ目的関数を導入する。
メタ目的関数の勾配を用いて、1つのグローバルステップサイズおよびパラメータごとのステップサイズの両更新ルールを導出する。
TD(λ)に類似した選択的トレースを用い、時間的信用配分を追跡し、効率的なメタ勾配計算を可能にする。
観察された学習進行状況と予測誤差に基づいて、ステップサイズを更新する別個のメタ最適化器を維持する。
SarsaやQ学習などの標準的な価値ベース強化学習アルゴリズムにメタステップサイズチューニングを統合し、それらをメタラーニング可能に拡張する。

実験結果

リサーチクエスチョン

RQ1選択的トレースを用いたメタ勾配降下は、強化学習におけるオンラインステップサイズチューニングに効果的に適用可能か？
RQ2固定ステップサイズと比較して、Metatraceは非定常環境における学習速度と安定性をどのように向上させるか？
RQ3非線形関数近似において、Metatraceは初期ステップサイズ選択への感受性をどの程度低減するか？
RQ4メタラーニングによるパラメータごとのステップサイズ適応は、複雑な制御タスクにおいて収束速度の向上とより優れたパフォーマンスをもたらすか？
RQ5従来のリプレイバッファを用いない環境において、Metatraceは状態分布の変化が顕著な状況でも効果的に動作するか？

主な発見

Metatraceは、固定ステップサイズと比較して、古典的なマウンテンカー環境における学習を顕著に加速する。
ノイズが多く非定常性が強いマウンテンカーのバージョンにおいて、固定ステップサイズが失敗または発散するのに対し、Metatraceは安定した学習を維持する。
Arcade Learning Environmentにおいて、初期ステップサイズの選択への感受性が低減され、広範な設定において安定したパフォーマンスが得られる。
Metatraceにおけるメタステップサイズハイパーパramータは設定が簡単であり、広範なタスクにおいても詳細なチューニングなしで効果的である。
Metatraceは非線形関数近似設定でもロバストな学習を可能にし、5つのAtariゲームにおいて収束速度と安定性の両方を向上させる。
Metatraceにおけるパラメータごとのステップサイズ適応は、スカラーステップサイズチューニングと比較して、より速い方策収束と優れた最終パフォーマンスを実現する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。