QUICK REVIEW

[論文レビュー] MinAtar: An Atari-Inspired Testbed for Thorough and Reproducible Reinforcement Learning Experiments

Kenny Young, Tian Tian|arXiv (Cornell University)|Mar 7, 2019

Reinforcement Learning in Robotics参考文献 16被引用数 42

ひとこと要約

MinAtar は 10×10 のグリッドと意味論的に意味あるチャネルを備えた、 Atari に着想を得た 5 つの簡略化環境を提供し、表現の複雑さを抑えつつ再現性のある行動重視の強化学習実験を可能にします。

ABSTRACT

The Arcade Learning Environment (ALE) is a popular platform for evaluating reinforcement learning agents. Much of the appeal comes from the fact that Atari games demonstrate aspects of competency we expect from an intelligent agent and are not biased toward any particular solution approach. The challenge of the ALE includes (1) the representation learning problem of extracting pertinent information from raw pixels, and (2) the behavioural learning problem of leveraging complex, delayed associations between actions and rewards. Often, the research questions we are interested in pertain more to the latter, but the representation learning problem adds significant computational expense. We introduce MinAtar, short for miniature Atari, a new set of environments that capture the general mechanics of specific Atari games while simplifying the representational complexity to focus more on the behavioural challenges. MinAtar consists of analogues of five Atari games: Seaquest, Breakout, Asterix, Freeway and Space Invaders. Each MinAtar environment provides the agent with a 10x10xn binary state representation. Each game plays out on a 10x10 grid with n channels corresponding to game-specific objects, such as ball, paddle and brick in the game Breakout. To investigate the behavioural challenges posed by MinAtar, we evaluated a smaller version of the DQN architecture as well as online actor-critic with eligibility traces. With the representation learning problem simplified, we can perform experiments with significantly less computational expense. In our experiments, we use the saved compute time to perform step-size parameter sweeps and more runs than is typical for the ALE. Experiments like this improve reproducibility, and allow us to draw more confident conclusions. We hope that MinAtar can allow researchers to thoroughly investigate behavioural challenges similar to those inherent in the ALE.

研究の動機と目的

Atari ゲームのコアとなる行動上の課題を捉えた、より小さく再現性の高いテストベッドを提供する。
重要なゲーム機構を保ちつつ、表現学習の複雑さを削減する。
より高速な学習とより多くの種を通じて、広範で統計的に頑健な実験を可能にする。
MinAtar 内の行動に焦点を当てたタスクで、異なる RL 手法の性能を示す。

提案手法

Seaquest、Breakout、Asterix、Freeway、Space Invaders の 5 つの MinAtar 環境を、n 個の意味チャネルを備えた 10×10 グリッドにマップする。
6 アクション（4 直交方向の移動、発射、ノーボップ）という縮小アクション空間を使用する。
簡略化された報酬と意味論的に意味ある入力チャネルを提供し、ピクセルベースの表現学習を回避する。
sticky-action とランダムスポーン位置による確率的要素を組み込み、変動性を導入する。
経験リプレイを用いた DQN のバリエーションと、適格性追跡 (AC(λ)) を用いたオンライン actor-critic を評価する。
小型ネットワークを使用（DQN: 16x3x3 畳み込み、128 ユニットの全結合）、CPU トレーニングとパラメータ探索を可能にするため 500万フレームのトレーニング。」],
research_questions':['表現学習よりも行動に焦点を当てた簡略化された Atari 風タスクで、経験リプレイの有無を問わず DQN や AC(λ) などの異なる RL アルゴリズムの性能はどうなるか。','MinAtar 環境における学習の安定性と性能に対するステップサイズのハイパーパラメータと適格性追跡の影響は何か。','MinAtar 環境は、Atari ゲームに似た質的な行動差異やカリキュラム様動態を示しつつ、より網羅的な実験を可能にするか。','MinAtar は探索、信用割り当て、ポリシーの安定性を、計算コストを抑えつつ効率的な代理として機能できるか。

実験結果

リサーチクエスチョン

RQ1表現学習よりも行動に焦点を当てた簡略化された Atari 風タスクで、経験リプレイの有無を問わず DQN や AC(λ) などの異なる RL アルゴリズムの性能はどうなるか。
RQ2MinAtar 環境における学習の安定性と性能に対するステップサイズのハイパーパラメータと適格性追跡の影響は何か。
RQ3MinAtar 環境は、Atari ゲームに似た質的な行動差異やカリキュラム様動態を示しつつ、より網羅的な実験を可能にするか。
RQ4MinAtar は探索、信用割り当て、ポリシーの安定性を、計算コストを抑えつつ効率的な代理として機能できるか。

主な発見

初期トレーニングでは DQN が AC(λ) より早く改善するが、長期的には複数の環境で AC(λ) が DQN を上回ることがある。
経験リプレイはすべてのゲームで DQN に明確な利点を提供する；リプレイなしの DQN は性能が低い。
RMSProp と活性化関数（SiLU/ dSiLU）を用いたオンライン AC(λ) は、いくつかのタスクで安定性と競争力のある性能を示す。
MinAtar はエージェント-環境ペアごとに 30 のランダムシードでのトレーニングを可能にし、信頼区間を狭め、徹底的なハイパーパラメータ探索を可能にする。
観察された質的な挙動には Breakout の道開き戦略や Seaquest の浮上傾向など、Atari の完全な複雑さを伴わずとも意味のある行動ダイナミクスを示す。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。