QUICK REVIEW

[论文解读] Long-term Planning by Short-term Prediction

|arXiv (Cornell University)|Feb 4, 2016

Adversarial Robustness in Machine Learning参考文献 26被引用 39

一句话总结

本文提出一种两阶段方法，用于在自动驾驶中进行长期规划，通过使用可微分的短期预测模型和循环神经网络来优化轨迹。通过将规划问题建模为在可微分预测器和序列模型上的监督学习，该方法即使在对抗性、连续性、多智能体环境中也能实现稳健的策略学习。

ABSTRACT

We consider planning problems, that often arise in autonomous driving applications, in which an agent should decide on immediate actions so as to optimize a long term objective. For example, when a car tries to merge in a roundabout it should decide on an immediate acceleration/braking command, while the long term effect of the command is the success/failure of the merge. Such problems are characterized by continuous state and action spaces, and by interaction with multiple agents, whose behavior can be adversarial. We argue that dual versions of the MDP framework (that depend on the value function and the $Q$ function) are problematic for autonomous driving applications due to the non Markovian of the natural state space representation, and due to the continuous state and action spaces. We propose to tackle the planning task by decomposing the problem into two phases: First, we apply supervised learning for predicting the near future based on the present. We require that the predictor will be differentiable with respect to the representation of the present. Second, we model a full trajectory of the agent using a recurrent neural network, where unexplained factors are modeled as (additive) input nodes. This allows us to solve the long-term planning problem using supervised learning techniques and direct optimization over the recurrent neural network. Our approach enables us to learn robust policies by incorporating adversarial elements to the environment.

研究动机与目标

解决自动驾驶中连续状态空间和动作空间下的长期规划问题。
克服传统MDP框架因非马尔可夫状态表示而带来的局限性。
在多智能体、对抗性环境中实现稳健的策略学习。
将复杂规划任务分解为监督学习与基于循环序列模型的直接优化。

提出的方法

训练一个可微分预测器，从当前观测预测近未来状态。
将预测器作为可微分组件集成到循环神经网络（RNN）中，以建模完整智能体轨迹。
在RNN中引入未解释因素作为附加输入节点，以建模不确定性和对抗性行为。
通过端到端训练，使用监督学习目标优化整个系统。
在训练过程中将对抗性元素集成到环境中，以提升策略的鲁棒性。
利用时间反向传播技术，实现对长时域策略的直接优化。

实验结果

研究问题

RQ1是否可以利用短期预测模型在连续、多智能体环境中实现长期规划？
RQ2在规划任务中，如何有效处理非马尔可夫状态表示？
RQ3可微分预测与基于RNN的轨迹建模是否能优于传统的MDP-based规划方法？
RQ4对抗性训练在多大程度上能提升规划系统中策略的鲁棒性？
RQ5在可微分RNN架构上进行端到端优化，能否实现有效的长时域控制？

主要发现

所提出的方法成功处理了具有连续状态空间和动作空间环境下的长期规划问题。
可微分的短期预测使长时域优化中的时间反向传播成为可能。
在训练中引入对抗性元素可显著提升策略的鲁棒性。
基于RNN的轨迹模型通过附加输入节点有效捕捉了未解释因素。
该方法通过解耦预测与规划，避免了双重MDP框架的局限性。
采用监督学习技术进行端到端训练，可获得稳定且泛化能力强的策略。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。