QUICK REVIEW

[论文解读] Neural Map: Structured Memory for Deep Reinforcement Learning

Emilio Parisotto, Ruslan Salakhutdinov|arXiv (Cornell University)|Feb 27, 2017

Reinforcement Learning in Robotics参考文献 22被引用 102

一句话总结

引入 Neural Map，一种结构化、可写入的 DRL 外部记忆，写入仅在代理的当前位置信息处，使用全局读取和上下文读取来存储与检索环境信息，从而提升在二维/三维迷宫中的基于记忆的推理能力，并实现对未见环境的泛化。

ABSTRACT

A critical component to enabling intelligent reasoning in partially observable environments is memory. Despite this importance, Deep Reinforcement Learning (DRL) agents have so far used relatively simple memory architectures, with the main methods to overcome partial observability being either a temporal convolution over the past k frames or an LSTM layer. More recent work (Oh et al., 2016) has went beyond these architectures by using memory networks which can allow more sophisticated addressing schemes over the past k frames. But even these architectures are unsatisfactory due to the reason that they are limited to only remembering information from the last k frames. In this paper, we develop a memory system with an adaptable write operator that is customized to the sorts of 3D environments that DRL agents typically interact with. This architecture, called the Neural Map, uses a spatially structured 2D memory image to learn to store arbitrary information about the environment over long time lags. We demonstrate empirically that the Neural Map surpasses previous DRL memories on a set of challenging 2D and 3D maze environments and show that it is capable of generalizing to environments that were not seen during training.

研究动机与目标

激励并解决在部分可观察、以导航为主的三维环境中运行的 DRL 代理的记忆局限性。
提出一种结构化的外部记忆（Neural Map），具有自适应、面向位置的写入，以在长时间尺度上存储显著的环境信息。
证明 Neural Map 在 2D 迷宫任务上优于 LSTM 和 MemNN 基线，并且能够泛化到未见环境，包括一个 3D Doom 设置。

提出的方法

将一个 2D/3D 空间记忆 M 定义为一个 C x H x W 的地图，与代理的位置绑定。
使用全局读取通过卷积网络从 M 产生 r_t。
使用对 M 的软注意力的上下文读取，从由 s_t 和 r_t 导出的查询产生上下文向量 c_t。
从 s_t、r_t、c_t 以及当前地图值计算局部写入 w_{t+1}^{(x_t,y_t)}，然后在代理位置更新 M。
可选扩展变体：(i) 局部读取，(ii) 键值上下文读取，(iii) 基于 GRU 的门控局部写入。
可选扩展为以自我中心坐标，通过应用反变换以使代理保持在地图中心，并通过 egoupdate 进行更新。
使用异步优势演员-评论家框架（A3C）训练，并修改为跨多个环境的同步更新。

实验结果

研究问题

RQ1是否具有写入局部性和上下文寻址的空间结构外部记忆能够改善部分可观察环境中的基于记忆的决策？
RQ2 Neural Map 记忆是否能够实现更长时间尺度的推理，并更好地泛化到未见迷宫和更复杂的三维环境？
RQ3像 GRU 基于写入、键值上下文、与自我中心映射等变体如何影响性能和稳定性？
RQ4 Neural Map 与 LSTM 和 MemNN 基线在 2D 目标搜索迷宫和 3D Doom 迷宫中的比较？

主要发现

代理	Train (7-11)	Train (13-15)	Train Total	Test (7-11)	Test (13-15)	Test Total
LSTM	60.6%	41.8%	59.3%	65.5%	47.5%	57.4%
MemNN-32	85.1%	58.2%	77.8%	92.6%	69.7%	83.4%
Neural Map	92.4%	80.5%	89.2%	93.5%	87.9%	91.7%
Neural Map (GRU)	97.0%	89.2%	94.9%	97.7%	94.0%	96.4%

Neural Map 在 2D Goal-Search 训练和保留测试迷宫上取得比 LSTM 和 MemNN 更高的成功率。
GRU 基于 Neural Map 相较于标准 Neural Map 进一步提升了训练速度、最终性能和训练稳定性。
在 Doom 3D 迷宫中，LSTM+Neural Map (GRU) 在训练和未见地图上均超越其他方法。
定性分析显示上下文读取聚焦在地标指示符上，证明了记忆在长距离关联中的有效使用。
具有固定大小历史的记忆网络在更长的迷宫上表现困难，而基于地图的记忆的 Neural Map 更具扩展性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。