QUICK REVIEW

[论文解读] Unsupervised Grounding of Plannable First-Order Logic Representation from Images

Masataro Asai|arXiv (Cornell University)|Feb 21, 2019

Reinforcement Learning in Robotics被引用 20

一句话总结

该论文提出了一种一阶状态自编码器（FOSAE），一种无监督神经网络，能够从基于图像的物体特征中无监督地学习可解释的一阶逻辑谓词。通过联合编码物体特征并发现可重用的关系模式，FOSAE生成了与经典规划兼容的紧凑符号化表示，在八数码谜题和照片级真实感积木世界环境中的表现均取得成功。

ABSTRACT

Recently, there is an increasing interest in obtaining the relational structures of the environment in the Reinforcement Learning community. However, the resulting "relations" are not the discrete, logical predicates compatible to the symbolic reasoning such as classical planning or goal recognition. Meanwhile, Latplan (Asai and Fukunaga 2018) bridged the gap between deep-learning perceptual systems and symbolic classical planners. One key component of the system is a Neural Network called State AutoEncoder (SAE), which encodes an image-based input into a propositional representation compatible to classical planning. To get the best of both worlds, we propose First-Order State AutoEncoder, an unsupervised architecture for grounding the first-order logic predicates and facts. Each predicate models a relationship between objects by taking the interpretable arguments and returning a propositional value. In the experiment using 8-Puzzle and a photo-realistic Blocksworld environment, we show that (1) the resulting predicates capture the interpretable relations (e.g. spatial), (2) they help obtaining the compact, abstract model of the environment, and finally, (3) the resulting model is compatible to symbolic classical planning.

研究动机与目标

通过从视觉输入中构建一阶逻辑，弥合神经感知与符号推理之间的鸿沟。
通过实现基于关系与物体参数的符号抽象，解决经典规划中命题表示的局限性。
开发一种无监督方法，无需人工标注的关系或奖励信号，即可发现可解释且可重用的谓词。
确保所学表示紧凑、泛化能力强，并可直接用于基于PDDL的经典规划系统。
通过可微分的注意力机制架构，实现从原始视觉观测到端到端的符号化推理。

提出的方法

FOSAE采用神经自编码器架构，处理来自图像块和边界框的物体特征向量，以重建输入状态。
其采用注意力机制，为每个谓词识别相关联的物体对或元组，实现在不同观测中动态选择参数。
模型在多个物体元组之间共享权重，通过学习共有的关系模式来实现泛化并减少参数数量。
谓词通过重建损失以无监督方式学习，对谓词符号或人工标注关系均无监督信号。
该架构支持可变的谓词元数，并学习具有实际语义的匿名谓词符号，其语义可从参数实例化模式中推断。
输出为一组一阶逻辑事实（带物体参数的谓词），与PDDL规划系统兼容。

实验结果

研究问题

RQ1无监督神经网络能否直接从视觉物体特征中学习可解释的一阶逻辑谓词？
RQ2所发现的谓词在不同物体配置和环境中的泛化能力如何？
RQ3所生成的符号化表示能否在视觉基底的领域中有效支持经典规划？
RQ4该模型架构在多大程度上促进了关系模式的紧凑性与可重用性？
RQ5基于注意力的参数选择机制在多大程度上提升了所学谓词的可解释性与泛化能力？

主要发现

FOSAE成功从视觉输入中学习到可解释的空间与关系谓词，人类对参数实例化模式的解读证实了这一点。
该模型实现了对输入状态的精确重建，视觉示例显示真实图像与重建图像高度一致。
在八数码谜题领域中，FOSAE学习到一种紧凑且可泛化的表示，可在多个测试实例中正确支持规划。
在照片级真实感积木世界环境中，FOSAE生成了与PDDL兼容的模型，成功对30个随机生成的三块实例实现了正确规划。
系统在四块环境上表现出可扩展性，成功报告了规划结果，但五块环境的规划因内存限制而无法实现。
所生成的符号化表示经验证与经典规划器兼容，规划结果经人工确认正确无误。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。