QUICK REVIEW

[论文解读] TidyBot: Personalized Robot Assistance with Large Language Models

Jimmy Wu, Rika Antonova|arXiv (Cornell University)|May 9, 2023

Multimodal Machine Learning Applications被引用 9

一句话总结

本论文介绍了 TidyBot，一种移动机器人，通过少量基于文本的示例，使用现货 LLM 来泛化到未见对象，从而实现高精度的现实世界整理。

ABSTRACT

For a robot to personalize physical assistance effectively, it must learn user preferences that can be generally reapplied to future scenarios. In this work, we investigate personalization of household cleanup with robots that can tidy up rooms by picking up objects and putting them away. A key challenge is determining the proper place to put each object, as people's preferences can vary greatly depending on personal taste or cultural background. For instance, one person may prefer storing shirts in the drawer, while another may prefer them on the shelf. We aim to build systems that can learn such preferences from just a handful of examples via prior interactions with a particular person. We show that robots can combine language-based planning and perception with the few-shot summarization capabilities of large language models (LLMs) to infer generalized user preferences that are broadly applicable to future interactions. This approach enables fast adaptation and achieves 91.2% accuracy on unseen objects in our benchmark dataset. We also demonstrate our approach on a real-world mobile manipulator called TidyBot, which successfully puts away 85.0% of objects in real-world test scenarios.

研究动机与目标

在用户与文化差异导致 receptacle 放置不同的情境下，激发个性化的家居清理。
提出一种方法，利用 LLM 将少量用户提供的放置示例总结为通用规则。
将总结出的规则融入开放词汇感知，以指导机器人行动。
在真实世界的移动操作系统上演示该方法并对泛化能力进行基准测试。
提供一个公开的基准数据集，用于评估个性化收纳选择。

提出的方法

将用户对物体放置的示例转换为 LLM 提示，使其输出类似代码的规则摘要。
使用 LLM 生成的摘要推导未知对象的通用收纳和类别映射。
将通用类别映射到开放词汇分类器（CLIP）上，以在新场景中进行对象识别。
实现一个机器人流水线：机器人定位物体，使用 CLIP 基于 LLM 推导的类别对其进行分类，并应用个性化规则将物体移动到目标收纳容器。
在基于文本的基准测试和使用移动型操作手的真实世界机器人实验中进行评估。

实验结果

研究问题

RQ1是否可以通过 LLM 摘要将少量用户提供的放置泛化到未见对象？
RQ2在个性化对象放置任务中，基于 LLM 的泛化规则是否优于诸如 WordNet、本体、嵌入或无摘要等基线？
RQ3该方法从基准场景到现实世界机器人整理的迁移效果如何？

主要发现

基于摘要的泛化在基准测试中对未见对象的准确率达到 91.2%（相对于基线）。
真实世界的 TidyBot 在测试场景中对 85.0% 的对象进行正确整理。
LLM 摘要通过提取名词类（如浅色衣物）来指导基于 CLIP 的分类，从而实现开放词汇感知。
A/B 测试和人类研究表明，在大多数排序标准上，人们更偏好基于 LLM 摘要的偏好，而非基于 CLIP 的基线。
较旧的 LLM 和非摘要基线在未见对象泛化方面表现显著较差。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。