QUICK REVIEW

[论文解读] Towards Deep Symbolic Reinforcement Learning

Marta Garnelo, Kai Arulkumaran|arXiv (Cornell University)|Sep 18, 2016

Reinforcement Learning in Robotics参考文献 29被引用 145

一句话总结

这篇论文提出了一种混合神经符号强化学习架构，具有用于符号定位的神经后端和用于策略学习的符号前端，在简单游戏变体中展示了相比全神经网络的深度强化学习在数据效率和迁移方面的优势。

ABSTRACT

Deep reinforcement learning (DRL) brings the power of deep neural networks to bear on the generic task of trial-and-error learning, and its effectiveness has been convincingly demonstrated on tasks such as Atari video games and the game of Go. However, contemporary DRL systems inherit a number of shortcomings from the current generation of deep learning techniques. For example, they require very large datasets to work effectively, entailing that they are slow to learn even when such datasets are available. Moreover, they lack the ability to reason on an abstract level, which makes it difficult to implement high-level cognitive functions such as transfer learning, analogical reasoning, and hypothesis-based reasoning. Finally, their operation is largely opaque to humans, rendering them unsuitable for domains in which verifiability is important. In this paper, we propose an end-to-end reinforcement learning architecture comprising a neural back end and a symbolic front end with the potential to overcome each of these shortcomings. As proof-of-concept, we present a preliminary implementation of the architecture and apply it to several variants of a simple video game. We show that the resulting system -- though just a prototype -- learns effectively, and, by acquiring a set of symbolic rules that are easily comprehensible to humans, dramatically outperforms a conventional, fully neural DRL system on a stochastic variant of the game.

研究动机与目标

激发并解决关键的深度强化学习（DRL）缺点：数据效率低下、脆弱性、缺乏高层次推理以及不透明性。
提出一个端到端架构，结合用于符号定位的神经后端与用于决策的符号前端。
展示一个简单视频游戏变体上的概念验证实现，以说明符号推理的好处。
突出通过符号表示实现迁移学习和透明性的潜力。

提出的方法

三阶段管线：通过卷积自编码器进行低层符号生成以产生符号令牌。
基于对象持久性、类型和关系的时空符号状态构建，追踪对象随时间的变化。
使用局部的组成式方法进行强化学习，为对象类型之间的交互训练单独的Q函数，并将它们组合以选择动作。

实验结果

研究问题

RQ1神经后端是否能够从原始感知数据中学习到一个具备组合性并且有 grounding 的符号表示？
RQ2符号前端是否能在强化学习任务中实现数据高效学习与迁移？
RQ3本地化、对象类型耦合的 Q 函数相对于单块神经策略在简单环境中的优势是什么？
RQ4与传统的 DRL（DQN）在具有不同对象类型和随机化的简单游戏变体上的比较如何？

主要发现

在原型设置中的四个游戏变体上，混合架构能够有效学习。
在最具挑战性的随机对象变体中，符号方法显著优于 DQN，在 DQN 在 1000 次训练内失败时学得一个胜任的策略。
系统实现了类似迁移的好处，在不重新训练后端的情况下对新变体实现泛化。
符号前端通过相关的 Q 函数和对象交互为行动提供了人类可理解的推理链。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。