QUICK REVIEW

[论文解读] Deep Active Inference for Autonomous Robot Navigation

Ozan Çatal, Samuel T. Wauthier|arXiv (Cornell University)|Mar 6, 2020

Scientific Computing and Data Management被引用 3

一句话总结

本文提出了一种基于深度神经网络从高维摄像头数据端到端学习状态表征的深度主动推理方法，用于现实世界中的机器人导航。该研究首次在实体机器人上实现了深度主动推理的应用，使机器人能够通过最小化期望自由能进行策略规划，自主导航至偏好状态，成功保持路径稳定并从扰动中恢复。

ABSTRACT

Active inference is a theory that underpins the way biological agent's perceive and act in the real world. At its core, active inference is based on the principle that the brain is an approximate Bayesian inference engine, building an internal generative model to drive agents towards minimal surprise. Although this theory has shown interesting results with grounding in cognitive neuroscience, its application remains limited to simulations with small, predefined sensor and state spaces. In this paper, we leverage recent advances in deep learning to build more complex generative models that can work without a predefined states space. State representations are learned end-to-end from real-world, high-dimensional sensory data such as camera frames. We also show that these generative models can be used to engage in active inference. To the best of our knowledge this is the first application of deep active inference for a real-world robot navigation task.

研究动机与目标

将主动推理扩展至具有高维感官输入的现实世界机器人导航。
通过从原始观测端到端学习生成模型，消除对预定义状态空间和动作空间的需求。
证明深度神经网络可在实体机器人平台上实现主动推理。
在使用自由能最小化的现实移动机器人导航于仓库过道的场景中验证该方法。

提出的方法

利用重参数化正态分布近似变分后验 Q(st|st−1, at−1, ot)、似然 P(ot|st) 和先验 P(st|st−1, at−1)。
使用变分自由能目标端到端训练网络，以最小化负对数似然和后验与先验之间的KL散度。
采用类似VAE的架构，包含编码器（qφ）、解码器（pξ）以及通过LSTM实现的时间建模递归先验（pθ）。
通过在不同策略下生成想象轨迹并选择使期望自由能 G(π) 最小的动作序列来执行规划。
采用带精度参数 γ 的Softmax策略选择，基于期望自由能最小化选择动作。
通过示范定义偏好状态，即机器人在过道中央行驶时观测到的状态分布。

实验结果

研究问题

RQ1深度主动推理能否成功应用于具有高维视觉观测的现实世界机器人导航？
RQ2端到端学习的生成模型能否替代主动推理中手工设计的状态空间？
RQ3机器人在使用自由能最小化与策略规划时，能否良好地保持在过道中央导航？
RQ4系统能否从外部扰动（如导航过程中被推搡）中恢复？

主要发现

机器人在多次试验中成功导航至过道中央的偏好状态，展示了稳健的路径跟踪能力。
即使机器人被手动推搡，系统仍保持稳定导航，表现出鲁棒性与恢复能力。
在不同策略（直行、左转、右转）下生成的想象轨迹正确预测了机器人的实际行为，验证了规划机制的有效性。
学习到的生成模型能从潜在表征中重建偏好状态，证实了有效的状态表征学习。
该方法在无显式奖励塑形或奖励建模的情况下实现了长时间稳定性能。
该方法成功部署于具备实时感官输入的真实移动机器人平台，标志着深度主动推理的首次真实世界部署。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。