QUICK REVIEW

[论文解读] Classical Planning in Deep Latent Space: Bridging the Subsymbolic-Symbolic Boundary

Masataro Asai, Alex Fukunaga|arXiv (Cornell University)|Apr 29, 2017

AI-based Problem Solving and Planning被引用 30

一句话总结

LatPlan 提出了一种无监督框架，通过变分自编码器（状态自编码器）学习离散的命题潜在空间，将非符号化的视觉输入与符号化的经典规划相连接，并联合推断动作符号及其模型（动作自编码器/判别器）。该方法无需人工提供的符号化模型，即可从原始图像对中实现与领域无关的规划，并在基于图像的八数码谜题、河内塔和灯光熄灭（LightsOut）领域中实现了最优解，同时支持端到端的视觉计划执行可视化。

ABSTRACT

Current domain-independent, classical planners require symbolic models of the problem domain and instance as input, resulting in a knowledge acquisition bottleneck. Meanwhile, although deep learning has achieved significant success in many fields, the knowledge is encoded in a subsymbolic representation which is incompatible with symbolic systems such as planners. We propose LatPlan, an unsupervised architecture combining deep learning and classical planning. Given only an unlabeled set of image pairs showing a subset of transitions allowed in the environment (training inputs), and a pair of images representing the initial and the goal states (planning inputs), LatPlan finds a plan to the goal state in a symbolic latent space and returns a visualized plan execution. The contribution of this paper is twofold: (1) State Autoencoder, which finds a propositional state representation of the environment using a Variational Autoencoder. It generates a discrete latent vector from the images, based on which a PDDL model can be constructed and then solved by an off-the-shelf planner. (2) Action Autoencoder / Discriminator, a neural architecture which jointly finds the action symbols and the implicit action models (preconditions/effects), and provides a successor function for the implicit graph search. We evaluate LatPlan using image-based versions of 3 planning domains: 8-puzzle, Towers of Hanoi and LightsOut.

研究动机与目标

为解决经典规划中的知识获取瓶颈问题，即必须由人工手工构建符号化的 PDDL 模型。
通过自动将视觉输入嵌入符号化规划表征，弥合非符号化与符号化之间的鸿沟，且无需对环境结构做先验假设。
实现从无标签图像转换和初始-目标图像对中进行与领域无关的规划，无需人工提供的动作模型或谓词。
证明深度学习可自动从视觉数据中归纳出符号化规划模型，从而通过现成规划器实现最优且完整的解决方案。

提出的方法

状态自编码器（SAE）使用变分自编码器将原始图像映射到离散的命题潜在向量空间，实现符号化状态表征。
动作自编码器（AAE）与判别器联合推断动作符号及其隐含的先决条件与效果，基于无标签的图像转换序列。
AAE/D 系统通过区分真实转换与生成转换，学习后继函数，实现在潜在空间中的隐式图搜索。
符号化规划器基于从学习到的潜在表征构建的 PDDL 模型运行，使用现成规划器寻找最优解。
通过将潜在状态序列解码回图像序列，实现计划执行的可视化。
训练采用 9:1 的训练集与验证集比例，且在状态空间较小的领域（如河内塔）中应用状态增强，以提升泛化能力。

实验结果

研究问题

RQ1深度学习系统能否仅从无标签的图像转换序列和初始-目标图像对中自动归纳出符号化 PDDL 模型？
RQ2系统能否在无须人工提供动作定义或接地标注的情况下，学习动作符号及其先决条件与效果？
RQ3潜在空间表征是否保留了足够的结构，以支持使用现成经典规划器实现最优规划？
RQ4系统能否泛化到具有非局部效应的领域（如 LightsOut）以及具有动态对象的领域（如消失的灯）？
RQ5所学习的符号化表征在不同视觉领域中是否具备鲁棒性与泛化能力，包括打乱图像和失真版本（如漩涡效应）？

主要发现

LatPlan 在八数码谜题、河内塔和 LightsOut 领域中，仅使用无标签图像转换序列，成功学习到了符号化表征，且无需任何人工提供的符号化模型。
该系统在所有测试领域中均实现了最优解，包括状态数达 362,880、动作数达 967,680 的八数码谜题，仅使用 20,000 个无标签转换进行训练。
动作自编码器与判别器成功推断出动作符号及其先决条件与效果，实现了潜在空间中正确后继函数的学习。
该方法可泛化至复杂领域，如 LightsOut，其中单个动作可影响最多 5/16 的网格，甚至可处理具有漩涡失真效果的版本。
系统对视觉干扰（如 Mandrill 和 Spider 八数码谜题）表现出鲁棒性，并能处理具有消失对象的领域，展现出超越局部静态对象环境的灵活性。
整个系统（包括预训练权重和源代码）已公开发布于 GitHub，以确保可复现性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。