QUICK REVIEW

[論文レビュー] Randomized Prior Functions for Deep Reinforcement Learning

Ian Osband, John Aslanides|arXiv (Cornell University)|Jun 8, 2018

Reinforcement Learning in Robotics参考文献 6被引用数 105

ひとこと要約

本論文は、標準的なエージェント–環境ループ内で探索を強化するためにランダム化された事前関数を提案する。

ABSTRACT

Dealing with uncertainty is essential for efficient reinforcement learning. There is a growing literature on uncertainty estimation for deep learning from fixed datasets, but many of the most popular approaches are poorly-suited to sequential decision problems. Other methods, such as bootstrap sampling, have no mechanism for uncertainty that does not come from the observed data. We highlight why this can be a crucial shortcoming and propose a simple remedy through addition of a randomized untrainable `prior' network to each ensemble member. We prove that this approach is efficient with linear representations, provide simple illustrations of its efficacy with nonlinear representations and show that this approach scales to large-scale problems far better than previous attempts.

研究の動機と目的

深層強化学習における探索を改善するためにランダム化された事前情報の使用を動機づける。
ランダム化された事前関数が標準的なDRLトレーニングループとどのように統合されるかを説明する。
リプレイバッファの使用を含むエージェント–環境の相互作用ワークフローを概説する。

提案手法

act、update_buffer、learn_from_buffer を持つエージェントを定義する。
エージェントが各反復でバッファから学習するエピソードを実行する。
環境をリセットして新しい遷移を取得し、次に現在の状態に対して agent.act で行動を決定する。
環境.step を用いて行動を適用し、得られた遷移を agent.update_buffer で保存する。
エピソードを繰り返し、バッファされた遷移から継続的に学習する。

実験結果

リサーチクエスチョン

RQ1ランダム化された事前関数は深層強化学習における探索効率を改善しますか？
RQ2標準的なDRLトレーニングループ内で、ランダム化された事前情報が学習の安定性とサンプル効率に与える影響は何ですか。

主な発見

提供された抜粋には記載がありません。
提供されたテキストには定量的な結果は示されていません。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。