QUICK REVIEW

[论文解读] A Brief Survey of Deep Reinforcement Learning

Kai Arulkumaran, Marc Peter Deisenroth|arXiv (Cornell University)|Aug 19, 2017

Reinforcement Learning in Robotics参考文献 121被引用 750

一句话总结

本论文综述深度强化学习（DRL），详细说明深度网络如何使强化学习扩展到高维问题，回顾关键基于价值和基于策略的 DRL 方法（例如 DQN、TRPO、A3C），并讨论应用、基准、挑战和未来研究方向。

ABSTRACT

Deep reinforcement learning is poised to revolutionise the field of AI and represents a step towards building autonomous systems with a higher level understanding of the visual world. Currently, deep learning is enabling reinforcement learning to scale to problems that were previously intractable, such as learning to play video games directly from pixels. Deep reinforcement learning algorithms are also applied to robotics, allowing control policies for robots to be learned directly from camera inputs in the real world. In this survey, we begin with an introduction to the general field of reinforcement learning, then progress to the main streams of value-based and policy-based methods. Our survey will cover central algorithms in deep reinforcement learning, including the deep $Q$-network, trust region policy optimisation, and asynchronous advantage actor-critic. In parallel, we highlight the unique advantages of deep neural networks, focusing on visual understanding via reinforcement learning. To conclude, we describe several current areas of research within the field.

研究动机与目标

激发并界定强化学习及其挑战。
解释深度学习如何使强化学习扩展到高维问题。
回顾核心的 DRL 范式：基于价值、基于策略，以及演员-评论家方法。
突出重要的 DRL 成功案例和常见基准。
讨论 DRL 的持续研究方向和实际考量。

提出的方法

给出 RL 基础与马尔可夫决策过程的结构化概述。
描述值函数和策略搜索框架及其方程。
介绍深度 DRL 技术，如 DQN、经验回放和目标网络。
解释对 Q 学习的改进（如双重 Q 学习、分布式 DQN）以及策略梯度方法（如演员-评论家）。
讨论规划与学习、基于模型与无模型的方法，以及样本效率。
综述应用和基准（如 Atari、机器人学）及未来挑战。

实验结果

研究问题

RQ1从高维输入中学习的主要 DRL 方法有哪些？
RQ2基于值和基于策略的 DRL 方法如何比较与互补？
RQ3哪些关键技术稳定了 DRL 训练（如经验回放、目标网络）？
RQ4哪些基准和应用展示了 DRL 的能力与局限？
RQ5当前未解决的挑战及未来 DRL 研究方向有哪些？

主要发现

DRL 使从高维感官输入（如图像）直接学习控制策略成为可能。
Atari 基准和类似 AlphaGo 的成功体现了 DRL 超越手工特征的潜力。
经验回放和目标网络等技术对稳定 DRL 训练至关重要。
混合型演员-评论家方法将值函数与策略优化结合起来，以平衡偏差与方差。
深度网络提供强大的表征，缓解 RL 的维数灾难。
DRL 的应用涵盖机器人、游戏和视觉-运动任务，显示出广泛潜力与仍然存在的挑战。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。