QUICK REVIEW

[论文解读] Deep Reinforcement Learning for Robotic Manipulation-The state of the art

Smruti Amarjyoti|arXiv (Cornell University)|Jan 31, 2017

Reinforcement Learning in Robotics参考文献 20被引用 57

一句话总结

对机器人操控的深度强化学习（DRL）方法按动作空间（DAS 与 CAS）和策略表示（SCAS 与 DCAS）进行整理的综述，详细介绍关键算法、架构以及真实世界与仿真实现。

ABSTRACT

The focus of this work is to enumerate the various approaches and algorithms that center around application of reinforcement learning in robotic ma- ]]nipulation tasks. Earlier methods utilized specialized policy representations and human demonstrations to constrict the policy. Such methods worked well with continuous state and policy space of robots but failed to come up with generalized policies. Subsequently, high dimensional non-linear function approximators like neural networks have been used to learn policies from scratch. Several novel and recent approaches have also embedded control policy with efficient perceptual representation using deep learning. This has led to the emergence of a new branch of dynamic robot control system called deep r inforcement learning(DRL). This work embodies a survey of the most recent algorithms, architectures and their implementations in simulations and real world robotic platforms. The gamut of DRL architectures are partitioned into two different branches namely, discrete action space algorithms(DAS) and continuous action space algorithms(CAS). Further, the CAS algorithms are divided into stochastic continuous action space(SCAS) and deterministic continuous action space(DCAS) algorithms. Along with elucidating an organ- isation of the DRL algorithms this work also manifests some of the state of the art applications of these approaches in robotic manipulation tasks.

研究动机与目标

推动在机器人操控中使用 DRL 相对于传统的手工设计策略。
按离散与连续动作空间以及随机与确定性策略对 DRL 方法进行整理。
解释深度学习如何实现端到端的视觉-运动控制与策略表示。
强调仿真到现实转移、训练稳定性和样本效率等实际考虑。

提出的方法

将 DRL 算法分类为离散动作空间（DAS）和连续动作空间（CAS）。
将 CAS 细分为随机连续动作空间（SCAS）和确定性连续动作空间（DCAS）。
描述核心算法（DQN、Double DQN、Dueling 网络、NAF、策略梯度变体、TRPO、DDPG）及其在机器人领域的适用性。
讨论使用深度网络的视觉-运动控制以及经验回放以稳定学习。
总结实现方面的要点，包括基于 CNN 的策略、演员-评论家结构，以及并行/异步学习。

实验结果

研究问题

RQ1在离散与连续动作空间中，哪些 DRL 算法和架构对机器人操控最有效？
RQ2策略表示（基价值、基策略、演员-评论家）在实时机器人操控任务中的表现如何？
RQ3从视觉输入学习以及从仿真到真实机器人转移的挑战与解决方案是什么？
RQ4哪些方法可以提高机器人领域 DRL 的样本效率与训练稳定性？
RQ5在复杂操控任务的迁移学习与奖励设计方面还存在哪些空白？

主要发现

DAS 方法（如 DQN 变体）适用于离散动作的机器人任务，但在连续动作空间方面存在挑战。
CAS 方法（策略搜索、演员-评论家）更自然用于连续机器人控制，DDPG 作为一个关键的确定性策略梯度方法。
NAF 与 DDPG 在连续控制任务以及实时机器人操控（如触达和开门）上表现强劲。
经验回放和目标网络稳定了基于视觉的机器人控制的 DRL 训练。
异步和并行的数据采集通过随机器人数量的增加显著减少训练时间，提高样本效率。
该综述指出迁移学习与奖励设计方面的空白，建议在逆强化学习和内在动机用于时间抽象方面进行更多工作。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。