QUICK REVIEW

[论文解读] Deep Whole-Body Control: Learning a Unified Policy for Manipulation and Locomotion

Zipeng Fu, Xuxin Cheng|arXiv (Cornell University)|Oct 18, 2022

Muscle activation and electromyography studies被引用 25

一句话总结

作者学习一个单一统一策略来协调带有臂的四足机器人在同时进行操作与移动，且引入正则化在线适应模块桥接仿真到现实（Sim-to-Real）并通过优势混合提升训练速度。

ABSTRACT

An attached arm can significantly increase the applicability of legged robots to several mobile manipulation tasks that are not possible for the wheeled or tracked counterparts. The standard hierarchical control pipeline for such legged manipulators is to decouple the controller into that of manipulation and locomotion. However, this is ineffective. It requires immense engineering to support coordination between the arm and legs, and error can propagate across modules causing non-smooth unnatural motions. It is also biological implausible given evidence for strong motor synergies across limbs. In this work, we propose to learn a unified policy for whole-body control of a legged manipulator using reinforcement learning. We propose Regularized Online Adaptation to bridge the Sim2Real gap for high-DoF control, and Advantage Mixing exploiting the causal dependency in the action space to overcome local minima during training the whole-body system. We also present a simple design for a low-cost legged manipulator, and find that our unified policy can demonstrate dynamic and agile behaviors across several task setups. Videos are at https://maniploco.github.io

研究动机与目标

通过紧密协调臂和腿控制来实现移动操作的可行性
开发一个统一的端到端策略，将操作与移动统一
在不使用两阶段教师-学生框架的情况下解决仿真到现实的 transferred
在低成本硬件平台和多样任务设置下展示鲁棒学习能力

提出的方法

构建一个单一的神经策略 pi，输入为底座、臂、腿状态以及前一动作与环境外在信息，输出臂和腿的目标关节位置
通过PPO利用组合的操作与移动奖励进行强化学习训练
引入优势混合，在策略更新中通过混合操作和移动的优势来分解信用分配
提出正则化在线适应，通过从特权仿真数据中学习环境外在潜在变量 z_mu，并将其正则化为从板载观测推断出的 z_phi，以实现仿真到现实的桥接
使用关节空间位置控制并配合臂和腿的PD扭矩，以简化学习并减少仿真到现实的差距
提供一个低成本、无拴系的硬件平台（Go1四足+ WidowX臂）用于现实世界评估

实验结果

研究问题

RQ1一个统一的单一策略是否比解耦或部分耦合的控制器在协调四足运动和臂部操作方面更有效？
RQ2优势混合是否能够加速学习并改善对同时进行的操作与移动的信用分配？
RQ3正则化在线适应是否能够在无需两阶段教师-学生体系的情况下实现鲁棒的仿真到现实转移？

主要发现

方法	生存率	底座加速	速度误差	端执行器误差	总能量
Unified (Ours)	97.1±0.61	1.00±0.03	0.31±0.03	0.63±0.02	50±0.90
Separate	92.0±0.90	1.40±0.04	0.43±0.07	0.92±0.10	51±0.30
Uncoordinated	94.9±0.61	1.03±0.01	0.33±0.01	0.73±0.02	50±0.28

统一策略在多个指标上优于分离和未协调的基线，拥有更高的生存率且能耗相当或更低
优势混合加速学习并提升对操作和移动的指令跟随，缩短收敛时间
正则化在线适应比快速运动适应和领域随机化在仿真到现实转移上表现更好，仿真模仿误差更小且对EE的跟踪更好
统一策略扩大臂部工作空间并在扰动下提升稳定性，显示出腿部与臂部之间强大的全身协调性
真实世界实验显示灵活的协同腿臂运动，以及相对于基线MPC+IK控制器的任务成功率与速度更优

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。