QUICK REVIEW

[论文解读] Generalized Grounding Graphs: A Probabilistic Framework for Understanding Grounded Commands

Thomas Kollar, Stefanie Tellex|arXiv (Cornell University)|Nov 29, 2017

Multimodal Machine Learning Applications参考文献 75被引用 23

一句话总结

本文提出广义接地图（G3），一种基于自然语言指令句法结构动态构建语言-世界接口的概率图模型。通过建模语言的组合性与层次性特征，G3 能够稳健地将名词短语、介词短语和动词与物理对象、位置及动作进行关联——在叉车、轮椅、微型无人飞行器（MAV）以及 PR2 机器人等多种机器人平台上，成功应用于移动与操作任务。

ABSTRACT

Many task domains require robots to interpret and act upon natural language commands which are given by people and which refer to the robot's physical surroundings. Such interpretation is known variously as the symbol grounding problem, grounded semantics and grounded language acquisition. This problem is challenging because people employ diverse vocabulary and grammar, and because robots have substantial uncertainty about the nature and contents of their surroundings, making it difficult to associate the constitutive language elements (principally noun phrases and spatial relations) of the command text to elements of those surroundings. Symbolic models capture linguistic structure but have not scaled successfully to handle the diverse language produced by untrained users. Existing statistical approaches can better handle diversity, but have not to date modeled complex linguistic structure, limiting achievable accuracy. Recent hybrid approaches have addressed limitations in scaling and complexity, but have not effectively associated linguistic and perceptual features. Our framework, called Generalized Grounding Graphs (G^3), addresses these issues by defining a probabilistic graphical model dynamically according to the linguistic parse structure of a natural language command. This approach scales effectively, handles linguistic diversity, and enables the system to associate parts of a command with the specific objects, places, and events in the external world to which they refer. We show that robots can learn word meanings and use those learned meanings to robustly follow natural language commands produced by untrained users. We demonstrate our approach for both mobility commands and mobile manipulation commands involving a variety of semi-autonomous robotic platforms, including a wheelchair, a micro-air vehicle, a forklift, and the Willow Garage PR2.

研究动机与目标

解决在复杂、不确定的物理环境中对自然语言指令进行接地的挑战。
克服符号系统在可扩展性与学习能力方面的局限，以及平坦统计模型无法捕捉语言组合性的缺陷。
通过学习词语的具身意义，使机器人能够理解未经训练用户发出的多样化、非结构化语言。
从语言解析结构动态构建概率图模型，以支持组合推理。
通过统一框架在多个机器人平台上实现对训练中未见的新指令的泛化能力。

提出的方法

G3 从自然语言指令的句法解析中动态生成概率图模型。
该框架将语言成分（如名词短语、介词短语）建模为随机变量，并将其与物理实体（如物体、位置、路径或动作）进行接地。
利用语言-物理接地对的语料库进行监督学习，训练将语言特征映射到感知与环境状态的概率谓词。
通过在动态构建的图上进行概率推理执行接地推理，得出最可能的指令解释。
通过保留语言中的论元结构与层次关系，支持组合推理，实现对嵌套从句与多论元动词的推理。
该框架支持不确定性处理，并可扩展以包含主动学习机制，例如在接地模糊性超过阈值时主动提出澄清问题。

实验结果

研究问题

RQ1能否通过概率框架动态建模自然语言指令的组合结构，实现在物理环境中的稳健接地？
RQ2此类框架在未见于训练数据的新指令上泛化能力如何？
RQ3系统能否在未经训练用户产生的语言多样性下保持高精度的接地能力？
RQ4该框架在具有不同感知与动作能力的多种机器人平台上应用的适用范围如何？
RQ5与平坦模型或仅符号化模型相比，语言结构的整合在多大程度上提升了接地精度？

主要发现

G3 在四种不同机器人平台上成功实现了自然语言指令的接地：叉车、轮椅、微型无人飞行器（MAV）以及 PR2 机器人。
该框架能够稳健地解释涉及多个语言成分的复杂指令，例如 '把轮胎托盘放到卡车上'，通过建模层次化与组合性结构实现。
系统通过在接地图上对学习到的词语意义进行概率推理，实现对训练集中未出现的新指令的泛化。
该方法通过捕捉论元结构并支持组合推理，在性能上优于平坦统计模型与符号系统。
通过基于熵的不确定性检测实现的主动学习扩展，使机器人能在接地模糊时主动提出有针对性的澄清问题。
该框架在领域特定语言（包括祈使句与空间关系）中表现出可扩展性与适应性，即使在世界模型不完善的情况下亦能有效运作。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。