QUICK REVIEW

[论文解读] ReKep: Spatio-Temporal Reasoning of Relational Keypoint Constraints for Robotic Manipulation

Wenlong Huang, Chen Wang|arXiv (Cornell University)|Sep 3, 2024

Semantic Web and Ontologies被引用 8

一句话总结

ReKep 将操作任务表示为 Relational Keypoint Constraints (ReKep)，在3D中对关键点之间的关系进行定位，这些关系从语言和 RGB-D 观测中自动生成，并通过分层优化在实时中求解，实现多阶段、野外场景下的机器人操控。

ABSTRACT

Representing robotic manipulation tasks as constraints that associate the robot and the environment is a promising way to encode desired robot behaviors. However, it remains unclear how to formulate the constraints such that they are 1) versatile to diverse tasks, 2) free of manual labeling, and 3) optimizable by off-the-shelf solvers to produce robot actions in real-time. In this work, we introduce Relational Keypoint Constraints (ReKep), a visually-grounded representation for constraints in robotic manipulation. Specifically, ReKep is expressed as Python functions mapping a set of 3D keypoints in the environment to a numerical cost. We demonstrate that by representing a manipulation task as a sequence of Relational Keypoint Constraints, we can employ a hierarchical optimization procedure to solve for robot actions (represented by a sequence of end-effector poses in SE(3)) with a perception-action loop at a real-time frequency. Furthermore, in order to circumvent the need for manual specification of ReKep for each new task, we devise an automated procedure that leverages large vision models and vision-language models to produce ReKep from free-form language instructions and RGB-D observations. We present system implementations on a wheeled single-arm platform and a stationary dual-arm platform that can perform a large variety of manipulation tasks, featuring multi-stage, in-the-wild, bimanual, and reactive behaviors, all without task-specific data or environment models. Website at https://rekep-robot.github.io/.

研究动机与目标

提供一种多用途、可扩展的基于约束的机器人操控表示，避免依赖任务特定的数据和环境模型。
利用来自RGB-D输入和自然语言指令的大型视觉模型(LVM)与视觉-语言模型(VLM)实现约束规范的自动化。
使实时的分层优化成为可能，通过感知-行动循环产生SE(3)末端执行器轨迹。
在真实机器人上展示多阶段、野外、双手协作以及对干扰的反应性操控，且无需任务特定数据。

提出的方法

将 Relational Keypoint Constraints (ReKep) 定义为将3D关键点映射到数值代价的Python函数，满足条件为 f(k) ≤ 0。
将任务分解为若干阶段，每阶段设定子目标约束和路径约束，从而实现对 SE(3) 末端执行器姿态的分层优化。
使用带辅助成本（如避免碰撞、可达性）的约束优化来求解逐阶段的子目标和路径问题，采用 SciPy（Dual Annealing + SLSQP）实现约1s热启与 ~10 Hz 重新规划。
在刚性假设下使用前向关键点模型将末端执行器运动与短时域内的关键点偏移相关联（0.1s），实现高频闭环控制。
通过使用 DINOv2 提出关键点并借助自由形式语言，自动从 RGB-D 与自然语言生成 ReKep；由 GPT-4o 输出以关键点的算术关系（距离、点积、旋转）表示的 ReKep Python 约束。
通过 SAM 掩码和聚类提出关键点，投影到世界坐标系，并以 20 Hz 跟踪关键点以实现实时反馈。

实验结果

研究问题

RQ1在没有任务特定数据的情况下，ReKep 是否能够从语言和 RGB-D 输入自动形成并综合操控行为？
RQ2系统在野外环境中对新对象和新操控策略的泛化能力如何？
RQ3各系统模块的失效模式及对总体性能的贡献有哪些？
RQ4该方法是否能够实现多阶段、双手协作以及对实时重新规划的响应性操控？

主要发现

该框架在两个机器人平台上实现了多阶段、野外、双手协作以及对干扰具有反应性的操控，且无需任务特定数据或环境模型。
基于 LVM 的自动 ReKep 生成使得从语言和 RGB-D 观测中进行开放世界的规范成为可能，约束基于语义关键点。
通过分层优化逐阶段求解子目标与路径约束实现实时闭环控制（~10 Hz）。
该方法对干扰表现出稳健的性能与反应性，成功率因任务与条件而异，且可识别的失效模式主要出现在点跟踪和提议/VLM 准确性。
消融研究表明关键点跟踪与提议/VLM 模块是失败的主要贡献者，而在时间预算内优化保持相对稳健。
衣物折叠研究揭示了在 GPT-4o 指导下出现的多样化、类别特定的策略，表明开放式的策略性行为。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。