QUICK REVIEW

[论文解读] Large Language Models for Robotics: A Survey

Fanlong Zeng, Wensheng Gan|arXiv (Cornell University)|Nov 13, 2023

Multimodal Machine Learning Applications被引用 35

一句话总结

本综述总结了大型语言模型在机器人领域的应用，涵盖控制、感知、决策与路径规划，介绍模型、技术、优势、挑战，以及面向具身智能的未来方向。

ABSTRACT

The human ability to learn, generalize, and control complex manipulation tasks through multi-modality feedback suggests a unique capability, which we refer to as dexterity intelligence. Understanding and assessing this intelligence is a complex task. Amidst the swift progress and extensive proliferation of large language models (LLMs), their applications in the field of robotics have garnered increasing attention. LLMs possess the ability to process and generate natural language, facilitating efficient interaction and collaboration with robots. Researchers and engineers in the field of robotics have recognized the immense potential of LLMs in enhancing robot intelligence, human-robot interaction, and autonomy. Therefore, this comprehensive review aims to summarize the applications of LLMs in robotics, delving into their impact and contributions to key areas such as robot control, perception, decision-making, and planning. This survey first provides an overview of the background and development of LLMs for robotics, followed by a discussion of their benefits and recent advancements in LLM-based robotic models. It then explores various techniques, employed in perception, decision-making, control, and interaction, as well as cross-module coordination in practical tasks. Finally, we review current applications of LLMs in robotics and outline potential challenges they may face in the near future. Embodied intelligence represents the future of intelligent systems, and LLM-based robotics is one of the most promising yet challenging paths toward achieving it.

研究动机与目标

回顾面向机器人应用的LLMs背景与发展，以及具身智能的概念。
分析基于LLM的机器人模型与应用的收益及最新进展。
总结在感知、决策、控制与交互中使用的LLM驱动的机器人技术。
讨论将LLMs整合到机器人系统中的挑战、局限性和未来发展方向。
突出具有代表性的LLM驱动机器人架构与平台。

提出的方法

描述与机器人相关的基本LLM概念与历史。
调查整合LLMs的机器人模型（如 PaLM-SayCan、PaLM-E、LM-Nav、Expedition A1）。
解释基于Transformer的机器人架构（RT-1、RT-2、RT-X、Control Transformer）及其作用。
概述利用LLMs的感知、决策、控制和交互技术（VLM、VNM、VLN、VLA）。
讨论LLM驱动机器人在多模态输入、规划和安全等方面的实际考虑。
总结机器人具身智能的潜在应用与未来方向。

实验结果

研究问题

RQ1LLMs 如何作为机器人的认知核心（大脑），以理解指令并在现实世界中行动？
RQ2在感知、决策、控制与交互方面使用LLMs来实现机器人应用的主要收益与局限性是什么？
RQ3哪些基于Transformer的架构能实现与LLMs协同的高效机器人，并且它们如何在不同任务间泛化？
RQ4在部署基于LLMs的机器人时会遇到哪些挑战（计算资源、安全性、一致性、标准化），以及如何应对？
RQ5通过LLM驱动的具身智能对社会的影响是什么？

主要发现

LLMs 使机器人具备自然语言交互、灵活的任务执行能力和个性化的用户体验。
PaLM-E、PaLM-SayCan、LM-Nav，以及 Expedition A1 展示了语言与感知、导航和控制之间的桥接。
Transformer-based 机器人架构（RT-1、RT-2、RT-X、CT）推动了规划、控制和视觉-语言整合。
新的概念如VLM/VNM/VLA使机器人具备端到端的感知与行动管线。
挑战包括大量的计算资源、内容安全、多轮对话，以及缺乏标准化的机器人形态。
该综述为实现具身智能铺设路径，并讨论日益强大的机器人系统的社会影响。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。