QUICK REVIEW

[论文解读] End-to-End Deep Reinforcement Learning for Lane Keeping Assist

Ahmad El Sallab, Mohammed Abdou|arXiv (Cornell University)|Dec 13, 2016

Reinforcement Learning in Robotics参考文献 21被引用 142

一句话总结

本论文在 TORCS 中使用离散(DQN)与连续(DDAC)动作空间的端到端深度强化学习进行车道保持，将性能与终止约束对学习收敛的影响进行对比。

ABSTRACT

Reinforcement learning is considered to be a strong AI paradigm which can be used to teach machines through interaction with the environment and learning from their mistakes, but it has not yet been successfully used for automotive applications. There has recently been a revival of interest in the topic, however, driven by the ability of deep learning algorithms to learn good representations of the environment. Motivated by Google DeepMind's successful demonstrations of learning for games from Breakout to Go, we will propose different methods for autonomous driving using deep reinforcement learning. This is of particular interest as it is difficult to pose autonomous driving as a supervised learning problem as it has a strong interaction with the environment including other vehicles, pedestrians and roadworks. As this is a relatively new area of research for autonomous driving, we will formulate two main categories of algorithms: 1) Discrete actions category, and 2) Continuous actions category. For the discrete actions category, we will deal with Deep Q-Network Algorithm (DQN) while for the continuous actions category, we will deal with Deep Deterministic Actor Critic Algorithm (DDAC). In addition to that, We will also discover the performance of these two categories on an open source car simulator for Racing called (TORCS) which stands for The Open Racing car Simulator. Our simulation results demonstrate learning of autonomous maneuvering in a scenario of complex road curvatures and simple interaction with other vehicles. Finally, we explain the effect of some restricted conditions, put on the car during the learning phase, on the convergence time for finishing its learning phase.

研究动机与目标

由于交互式的驾驶环境，激励使用强化学习用于自动驾驶。
研究将原始传感器输入映射到驾驶动作的端到端模型，而无需手工特征。
比较离散动作（DQN）与连续动作（DDAC）的深度强化学习方法在车道保持任务中的表现。
评估受限的终止条件如何影响学习收敛时间。

提出的方法

将车道保持建模为一个带有摄像头、激光雷达和雷达输入传感器融合的 DRL 问题。
应用两种 DRL 范式：离散动作的 Deep Q-Network (DQN) 与连续动作的 Deep Deterministic Actor-Critic (DDAC)。
在 TORCS 模拟器上对端到端网络进行训练，使用 trackPos 和车辆速度作为输入，输出为方向盘角度、档位、加速和刹车。
对 DQN 使用瓷砖编码（tile coding）进行动作离散化，对 DDAC 采用带演员-评论家的策略梯度。
在直线和弯道赛段上评估性能，以比较收敛性和轨迹质量。
考察终止条件（无终止、出轨、卡死、出轨并卡死）对收敛时间的影响。

实验结果

研究问题

RQ1端到端的 DRL 模型是否能够仅从原始传感器输入学习车道保持，而无需手工特征？
RQ2离散（DQN）与连续（DDAC）动作形式在学习效果与轨迹平滑性方面有何比较？
RQ3不同终止条件对基于 DRL 的车道保持学习收敛时间有何影响？
RQ4与 DQN 相比，DDAC 是否在弯道段提供更平滑的控制和更好的性能？

主要发现

与使用瓷砖离散动作的 DQN 相比，DDAC 在弯道段提供更平滑的转向和更好的性能。
DDQN（带瓷砖编码的 DQN）在某些设置下收敛更快，但可能产生更突然的转向动作。
没有终止条件的设置比受限终止的设置更快收敛，但存在探索较差和局部极小值的风险。
限制终止条件通常会增加收敛时间，因为需要更频繁地重置情节。
在直线路段，两种方法表现相似；在弯道段，DDAC 优于 DQN。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。