[论文解读] Self-Supervised Visual Planning with Temporal Skip Connections
本文提出了一种具备遮挡感知的视频预测模型(SNA),具有时序跳跃连接和用于视觉MPC的基于距离的规划目标,能够在遮挡中进行规划并处理多对象,采用混合连续-离散动作空间。
In order to autonomously learn wide repertoires of complex skills, robots must be able to learn from their own autonomously collected data, without human supervision. One learning signal that is always available for autonomously collected data is prediction: if a robot can learn to predict the future, it can use this predictive model to take actions to produce desired outcomes, such as moving an object to a particular location. However, in complex open-world scenarios, designing a representation for prediction is difficult. In this work, we instead aim to enable self-supervised robotic learning through direct video prediction: instead of attempting to design a good representation, we directly predict what the robot will see next, and then use this model to achieve desired goals. A key challenge in video prediction for robotic manipulation is handling complex spatial arrangements such as occlusions. To that end, we introduce a video prediction model that can keep track of objects through occlusion by incorporating temporal skip-connections. Together with a novel planning criterion and action space formulation, we demonstrate that this model substantially outperforms prior work on video prediction-based control. Our results show manipulation of objects not seen during training, handling multiple objects, and pushing objects around obstructions. These results represent a significant advance in the range and complexity of skills that can be performed entirely with self-supervised robotic learning.
研究动机与目标
- 通过视频预测,从自主收集数据中激励自监督机器人学习。
- 开发一个具备遮挡感知的预测模型,能够在遮挡过程中保持对象永久性。
- 通过对像素位置的平滑距离基础代价,改善基于视觉的控制规划。
- 在模型预测控制框架内实现对离散与连续两类动作的混合规划。
提出的方法
- 提出一个跳跃连接神经元推演(SNA)模型,将DNA扩展为具备时序跳跃连接的结构,以在遮挡中保持对象永久性。
- 通过将多个经过变换的过去图像与学习到的掩码拼接来预测下一帧,从而通过从历史中拷贝实现遮挡处理。
- 使用距离基的规划目标,在预测像素位置与目标之间的期望欧氏距离上最小化,与时间 horizon T。
- 采用采样式模型预测控制(CEM)并使用混合动作空间,将连续的末端执行器运动与离散的抬升动作结合起来。
- 将动作表示为一个向量,包含水平运动和离散抬升等级,并在优化时向最近的离散步进行舍入。
- 在没有外部监督的情况下,从随机收集的推挤轨迹中训练视频预测模型。
实验结果
研究问题
- RQ1遮挡感知的视频预测模型在操作过程中是否能在遮挡后跟踪指定像素?
- RQ2基于距离的规划目标是否在遮挡下提升长时 horizon 的可视化MPC性能?
- RQ3混合动作空间(连续与离散抬升)能否有效整合到桌面上操作的采样式MPC中?
- RQ4提出的SNA模型在遮挡密集任务和未知对象上与以往基于DNA的方法相比有何差异?
主要发现
- 与先前的DNA方法相比,SNA模型在遮挡密集任务中显著提升了规划性能。
- 在预测像素位置上的期望距离代价比基于概率的代价更有利于长时规划。
- 混合动作空间使末端执行器能够抬升以越过障碍物,从而产生更自然且更短的轨迹。
- SNA在遮挡对象上保持预测质量,使未见对象与多对象情形下的规划成为可能。
- 实验在包含遮挡和多对象的推挤任务中,自监督视频预测引导控制。
- 结合新规划代价的SNA在Seen对象和Unseen对象上达到与以往方法相当甚至更优的结果。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。