QUICK REVIEW

[论文解读] Randomized Prior Functions for Deep Reinforcement Learning

Ian Osband, John Aslanides|arXiv (Cornell University)|Jun 8, 2018

Reinforcement Learning in Robotics参考文献 6被引用 105

一句话总结

本文提出随机先验函数来在标准的智能体–环境循环中提升深度强化学习的探索能力。

ABSTRACT

Dealing with uncertainty is essential for efficient reinforcement learning. There is a growing literature on uncertainty estimation for deep learning from fixed datasets, but many of the most popular approaches are poorly-suited to sequential decision problems. Other methods, such as bootstrap sampling, have no mechanism for uncertainty that does not come from the observed data. We highlight why this can be a crucial shortcoming and propose a simple remedy through addition of a randomized untrainable `prior' network to each ensemble member. We prove that this approach is efficient with linear representations, provide simple illustrations of its efficacy with nonlinear representations and show that this approach scales to large-scale problems far better than previous attempts.

研究动机与目标

激发使用随机先验以提升深度强化学习中的探索。
描述随机先验函数如何与标准的 DRL 训练循环集成。
概述包含回放缓冲区使用在内的智能体–环境交互工作流。

提出的方法

定义一个具备 act、update_buffer、learn_from_buffer 方法的智能体。
运行 Episode，在每次迭代中让智能体从缓冲区学习。
重置环境以获得新的转移，然后通过 agent.act 在当前状态上确定行动。
使用 environment.step 应用该行动，并将产生的转移通过 agent.update_buffer 存储。
在多个 Episode 上迭代，以持续从缓冲转移中学习。

实验结果

研究问题

RQ1随机先验函数是否能提升深度强化学习中的探索效率？
RQ2在标准 DRL 训练循环中，随机先验对学习稳定性和样本效率的影响是什么？

主要发现

在所提供的摘录中不可用。
在所提供的文本中未显示定量结果。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。