QUICK REVIEW

[论文解读] ReLMoGen: Leveraging Motion Generation in Reinforcement Learning for Mobile Manipulation

Fei Xia, Chengshu Li|arXiv (Cornell University)|Aug 18, 2020

Reinforcement Learning in Robotics参考文献 58被引用 38

一句话总结

ReLMoGen 将动作空间提升为子目标来进行运动规划，将运动生成器与强化学习整合，从而有效解决长时程的移动操作任务，并在不同运动规划器之间显示出强传输性。

ABSTRACT

Many Reinforcement Learning (RL) approaches use joint control signals (positions, velocities, torques) as action space for continuous control tasks. We propose to lift the action space to a higher level in the form of subgoals for a motion generator (a combination of motion planner and trajectory executor). We argue that, by lifting the action space and by leveraging sampling-based motion planners, we can efficiently use RL to solve complex, long-horizon tasks that could not be solved with existing RL methods in the original action space. We propose ReLMoGen -- a framework that combines a learned policy to predict subgoals and a motion generator to plan and execute the motion needed to reach these subgoals. To validate our method, we apply ReLMoGen to two types of tasks: 1) Interactive Navigation tasks, navigation problems where interactions with the environment are required to reach the destination, and 2) Mobile Manipulation tasks, manipulation tasks that require moving the robot base. These problems are challenging because they are usually long-horizon, hard to explore during training, and comprise alternating phases of navigation and interaction. Our method is benchmarked on a diverse set of seven robotics tasks in photo-realistic simulation environments. In all settings, ReLMoGen outperforms state-of-the-art Reinforcement Learning and Hierarchical Reinforcement Learning baselines. ReLMoGen also shows outstanding transferability between different motion generators at test time, indicating a great potential to transfer to real robots.

研究动机与目标

激发并解决移动操作任务中的探索与长时程挑战。
提出一个框架，在强化学习循环中将动作提升为子目标，以供运动生成器使用。
展示在导航、交互导航和移动操作任务中的性能提升与样本效率。

提出的方法

引入一个 lifted MDP，其中子目标 a' 指导运动生成器（MG）生成低级动作。
两种子目标生成策略变体：连续型（SGP-R）和离散型（SGP-D），分别使用 SAC 或 DQN 进行训练。
一个运动生成器，结合规划器（RRT-Connect 或 PRM）和轨迹控制器以达到子目标。
定义 lifted 转移和奖励函数：MG 输出一系列低级动作；R' 对该序列的 MG 奖励进行累积。
训练 SGP，根据来自 RGB-D、LiDAR 和任务信息的观测预测子目标，使导航与交互能够使用基座和机械臂子目标。
通过在测试时替换运动规划器而不重新训练来证明可迁移性。

实验结果

研究问题

RQ1ReLMoGen 能否解决涉及导航与操作的广泛机器人任务？
RQ2将动作空间提升为子目标是否能在长时程移动操作任务中改善探索与样本效率？
RQ3在测试时对运动规划器的变化，所学的子目标生成策略是否具有鲁棒性？
RQ4在对操控需求不同的任务中，连续与离散子目标参数化的比较如何？

主要发现

与最先进的强化学习和层层强化学习基线相比，ReLMoGen 在七个任务上的任务完成率更高。
ReLMoGen 展现出更快的收敛和更高的样本效率，由于梯度更新次数较少，训练在实际时钟时间上通常快7倍。
该方法产生可解释的子目标地图，高价值区域与有益交互匹配（例如按钮、橱柜门）。
ReLMoGen 使在测试时能够迁移到不同的运动规划器，性能损失最小，展现出对真实机器人强鲁棒性和实用性。
SGP-D（离散子目标地图）在需要精细操控的任务中表现更好，而 SGP-R（连续子目标回归）在更广泛的导航与交互场景中表现卓越。
在探索分析中，ReLMoGen 探索了有意义的交互，并覆盖比纯动作空间 RL 基线更大的物理和潜在状态区域。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。