QUICK REVIEW

[論文レビュー] Neural Map: Structured Memory for Deep Reinforcement Learning

Emilio Parisotto, Ruslan Salakhutdinov|arXiv (Cornell University)|Feb 27, 2017

Reinforcement Learning in Robotics参考文献 22被引用数 102

ひとこと要約

Neural Mapを紹介する、DRLのための構造化された外部メモリで、エージェントの現在位置でのみ書き込み、グローバル読み取りとコンテキスト読み取りを用いて環境情報を保存・取得し、2D/3D迷路でのメモリベース推論を改善し、未見の環境への一般化を可能にする。

ABSTRACT

A critical component to enabling intelligent reasoning in partially observable environments is memory. Despite this importance, Deep Reinforcement Learning (DRL) agents have so far used relatively simple memory architectures, with the main methods to overcome partial observability being either a temporal convolution over the past k frames or an LSTM layer. More recent work (Oh et al., 2016) has went beyond these architectures by using memory networks which can allow more sophisticated addressing schemes over the past k frames. But even these architectures are unsatisfactory due to the reason that they are limited to only remembering information from the last k frames. In this paper, we develop a memory system with an adaptable write operator that is customized to the sorts of 3D environments that DRL agents typically interact with. This architecture, called the Neural Map, uses a spatially structured 2D memory image to learn to store arbitrary information about the environment over long time lags. We demonstrate empirically that the Neural Map surpasses previous DRL memories on a set of challenging 2D and 3D maze environments and show that it is capable of generalizing to environments that were not seen during training.

研究の動機と目的

DRLエージェントが部分観察の多いナビゲーション重視の3D環境でメモリ制限に対処・動機付ける。
長い時間スパンの情報を格納するための適応的な位置特異的書き込みを備えた構造化外部メモリ（Neural Map）を提案。
Neural Mapが2D迷路タスクでLSTMとMemNNのベースラインより優れていることと、3D Doom設定を含む未見環境への一般化を示す。

提案手法

エージェントの位置に結びついたC x H x W マップとして2D/3D空間メモリ M を定義。
畳み込みネットワークを介して M から r_t を生成するグローバルリードを使用。
s_t と r_t から派生したクエリを用いて、M 上のソフトアテンションでコンテキストベクトル c_t を生成するコンテキストリードを使用。
局所書き込み w_{t+1}^{(x_t,y_t)} を s_t, r_t, c_t, 現在のマップ値から計算し、エージェントの位置で M を更新。
任意で拡張： (i) 局所的リード、 (ii) キーと値のコンテキストリード、 (iii) GRUベースのゲーティッド局所書き込み。
任意でエゴセントリック座標系へ拡張し、エージェントをマップの中心に保つカ counter-transform を適用し egoupdate によって更新。
複数環境で同期更新へ変更した非同期-アドバンテージアクタークリティックフレームワーク(A3C)で訓練。

実験結果

リサーチクエスチョン

RQ1局所性のある書き込みとコンテキストアドレス指定を備えた空間的に構造化された外部メモリは、部分観測環境でのメモリベースの意思決定を改善できるか。
RQ2Neural Map メモリは長期間の推論と未見の迷路およびより複雑な3D環境への一般化を可能にするか。
RQ3GRUベースの書き込み、キー-値コンテキスト、エゴセントリックマッピングなどのバリアントは性能と安定性にどのように影響するか。
RQ4Neural Mapは2Dのゴール探索迷路と3D Doom迷路でLSTMとMemNNベースラインと比較してどうか。

主な発見

エージェント	訓練 (7-11)	訓練 (13-15)	訓練合計	テスト (7-11)	テスト (13-15)	テスト合計
LSTM	60.6%	41.8%	59.3%	65.5%	47.5%	57.4%
MemNN-32	85.1%	58.2%	77.8%	92.6%	69.7%	83.4%
Neural Map	92.4%	80.5%	89.2%	93.5%	87.9%	91.7%
Neural Map (GRU)	97.0%	89.2%	94.9%	97.7%	94.0%	96.4%

Neural Mapは2DのGoal-Searchのトレーニングと保持テスト迷路でLSTMとMemNNより高い成功率を達成。
GRUベースのNeural Mapは標準 Neural Mapと比較して訓練速度・最終性能・訓練安定性をさらに向上させる。
Doom 3D迷路では、LSTM+Neural Map（GRU）は訓練と未知マップの両方で他のすべての方法を上回る。
定性的分析では、コンテキストリードがランドマーク指標に焦点を合わせ、長距離連想のための記憶の有効活用を示している。
固定サイズの履歴を持つメモリネットワークは長い迷路で苦戦する一方、Neural Mapはマップベースのメモリでスケールする。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。