QUICK REVIEW

[论文解读] HomeRobot: Open-Vocabulary Mobile Manipulation

Sriram Yenamandra, Arun Ramachandran|arXiv (Cornell University)|Jun 20, 2023

Multimodal Machine Learning Applications被引用 13

一句话总结

定义 Open-Vocabulary Mobile Manipulation (OVMM) 并提供一个可复现的基准和软件栈（仿真与现实世界）用于在开放对象集合条件下评估端到端移动操作，使用 Hello Robot Stretch；报告基线，仿真到现实转移在实际环境中的成功率大致达到 20%。

ABSTRACT

HomeRobot (noun): An affordable compliant robot that navigates homes and manipulates a wide range of objects in order to complete everyday tasks. Open-Vocabulary Mobile Manipulation (OVMM) is the problem of picking any object in any unseen environment, and placing it in a commanded location. This is a foundational challenge for robots to be useful assistants in human environments, because it involves tackling sub-problems from across robotics: perception, language understanding, navigation, and manipulation are all essential to OVMM. In addition, integration of the solutions to these sub-problems poses its own substantial challenges. To drive research in this area, we introduce the HomeRobot OVMM benchmark, where an agent navigates household environments to grasp novel objects and place them on target receptacles. HomeRobot has two components: a simulation component, which uses a large and diverse curated object set in new, high-quality multi-room home environments; and a real-world component, providing a software stack for the low-cost Hello Robot Stretch to encourage replication of real-world experiments across labs. We implement both reinforcement learning and heuristic (model-based) baselines and show evidence of sim-to-real transfer. Our baselines achieve a 20% success rate in the real world; our experiments identify ways future research work improve performance. See videos on our website: https://ovmm.github.io/.

研究动机与目标

推动并形式化 OVMM 作为一个核心的在家机器人挑战，结合感知、导航和在开放世界对象集合中的操作。
提供一个可复用的基准和基础设施，分别在仿真和现实世界，以促进 OVMM 的可重复研究。
展示基线方法（基于启发式规划和 RL）并评估 nav 和 place 技能的仿真到现实转移。
通过标准化硬件栈和 API 实现跨实验室的复制与比较。

提出的方法

引入 OVMM 任务：在一个未知的单层住宅中，将一个物体从 start_receptacle 移动到 goal_receptacle，对象集合为开放词汇（open vocabulary）。
仿真数据集使用 Habitat，包含来自 HSSD 的 60 个场景，以及跨 129 个类别的 2,535 个对象和 21 种 receptacle 类别；定义带有 seen/unseen 类别和实例的训练/验证/测试划分。
提供一个现实世界基准环境（受控公寓）和一个低成本的 Hello Robot Stretch 平台，用于可重复实验。
实现 HomeRobot 库，提供在仿真和现实世界中相同的 API，使端到端基准和模块化基线成为可能。
开发两种基线策略：一个基于启发式运动规划的基线，使用 DETIC 进行对象掩模；以及一个用 DDPPO 在深度、分割和本体感知输入上训练的 RL 基线。
评估感知（真实标签 vs DETIC）、导航、凝视、抓取和放置等子技能，并分析仿真到现实的差距。

Figure 1: Open-Vocabulary Mobile Manipulation requires agents to search for a previously unseen object at a particular location, and move it to the correct receptacle.

实验结果

研究问题

RQ1OVMM 如何在具开放词汇对象的家庭环境中，在仿真和现实世界中被定义和基准化？
RQ2在 OVMM 任务中，启发式与 RL 基线的性能如何，感知质量（真实标签与 DETIC）对结果有何影响？
RQ3OVMM 中导航与放置技能的仿真到现实转移在多大程度上可以实现？
RQ4存在的瓶颈（感知、导航、操控）限制现实世界 OVMM 的成功，以及统一的机器人栈如何解决它们？

主要发现

在仿真中，RL 基线在导航和放置方面优于启发式方法，但感知质量对所有方法的性能有显著影响。
真实标签感知比 DETIC 基于感知更高成功率，表明感知是一个主要瓶颈。
在仿真中，部分与总体成功率显示，使用真实分割的 RL 比启发式基线得分更高；而 DETIC 分割降低了两种方法的性能。
在现实世界实验中，RL 实现了 20% 的总体成功率，比启发式基线高出 5 个百分点，得益于抓取和放置子任务的提升。
开放词汇模型 DETIC 的检测引入了错分失效，显著影响仿真和现实世界的性能。
HomeRobot OVMM 栈可在仿真与真实硬件之间实现可重复的端到端基准测试，突出仿真到现实的差距以及统一框架的重要性。

Figure 2: A low-cost home robot performing tasks in both a simulated and a real-world environment. We provide both (1) challenging simulated tasks, wherein a mobile manipulator robot must find and grasp multiple seen and unseen objects, and (2) a corresponding real-world robotics stack to allow othe

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。