QUICK REVIEW

[论文解读] SayTap: Language to Quadrupedal Locomotion

Yujin Tang, Wenhao Yu|arXiv (Cornell University)|Jun 13, 2023

Human Pose and Action Recognition被引用 11

一句话总结

本文提出以足部接触模式作为自然语言指令与基于 DRL 的四足运动控制器之间的接口，实现灵活、语言驱动的运动控制，并能迁移到真实硬件。

ABSTRACT

Large language models (LLMs) have demonstrated the potential to perform high-level planning. Yet, it remains a challenge for LLMs to comprehend low-level commands, such as joint angle targets or motor torques. This paper proposes an approach to use foot contact patterns as an interface that bridges human commands in natural language and a locomotion controller that outputs these low-level commands. This results in an interactive system for quadrupedal robots that allows the users to craft diverse locomotion behaviors flexibly. We contribute an LLM prompt design, a reward function, and a method to expose the controller to the feasible distribution of contact patterns. The results are a controller capable of achieving diverse locomotion patterns that can be transferred to real robot hardware. Compared with other design choices, the proposed approach enjoys more than 50% success rate in predicting the correct contact patterns and can solve 10 more tasks out of a total of 30 tasks. Our project site is: https://saytap.github.io.

研究动机与目标

通过将自然语言与低级别运动控制桥接，促进面向四足的直观人机交互。
提出将足部接触模式作为自然语言与运动控制器之间的紧凑接口。
训练一个 LLM-to-pattern 模块和一个基于 DRL 的控制器，以实现多样化的实时运动。
展示从仿真到真实四足机器人（Unitree A1）的学习控制器的可迁移性。

提出的方法

设计一个 LLM 提示策略，将任意自然语言命令转化为 4xLw 的足部接触模式模板（0/1）。
在训练中使用随机模式生成器，在步态类型（BOUND, TROT, PACE, STAND_STILL, STAND_3LEGS）之间创建多样化的接触模式模板。
训练一个 DRL 策略（在 IsaacGym 上的 PPO），以本体感知、速度指令和期望的接触模式作为输入，输出关节位置。
在策略输出中引入双向对称性技巧，以提高步态自然度并缩小仿真到现实的差距。
让控制器暴露在接触模式分布中，并将奖励定制为对接触时序的关注，而非明确轨迹。
将 LLM 生成的接触模式转换为真实硬件的低级命令，而无需大量微调。

Figure 1: Illustration of the results on a physical quadrupedal robot. We show two test commands at the top, and the snapshots of the robot in the top row of the figure. The row in the middle shows the desired contact patterns translated from the commands by an LLM (the pattern in between the comman

实验结果

研究问题

RQ1足部接触模式是否能作为自然语言与低级四足控制之间的有效接口？
RQ2LLM 将非结构化命令映射到适用于多种步态的可行接触模式模板的能力有多高？
RQ3DRL 控制器是否能够同时实现主要运动任务与指定的接触模式，并实现从仿真到真实机器人的迁移？
RQ4语言驱动的接口在实际中是否能支持非结构化和模糊的自然语言指令？

主要发现

基于 LLM 的接口在 30 个任务上预测正确接触模式的准确率比两个基线高出约 50%。
所学习的控制器在仿真中跟踪指令的线性速度，同时产生接近期望的接触模式，并且无需微调即可迁移到真实的 Unitree A1。
足部接触模式接口在灵活性和准确性上优于离散步态和正弦参数基线。
该系统能够对显式指令和非结构化自然语言表达做出响应，从而实现富表达力的人机交互。
该方法展示了在真实机器人上的成功部署以及Supporting 视频证据（Figure 1）。

Figure 2: Overview of the proposed approach. In addition to the robot’s proprioceptive sensory data and task commands (e.g., following a desired linear velocity $\hat{v}_{x}$ ), the locomotion controller accepts desired foot contact patterns as input, and outputs desired joint positions. The foot co

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。