QUICK REVIEW

[論文レビュー] End-to-End Deep Reinforcement Learning for Lane Keeping Assist

Ahmad El Sallab, Mohammed Abdou|arXiv (Cornell University)|Dec 13, 2016

Reinforcement Learning in Robotics参考文献 21被引用数 142

ひとこと要約

本論文は TORCS を用いた離散 (DQN) および連続 (DDAC) アクション空間による車線維維持のエンドツーエンド深層強化学習を探索し、性能の比較と学習収束への終了条件の影響を検討する。

ABSTRACT

Reinforcement learning is considered to be a strong AI paradigm which can be used to teach machines through interaction with the environment and learning from their mistakes, but it has not yet been successfully used for automotive applications. There has recently been a revival of interest in the topic, however, driven by the ability of deep learning algorithms to learn good representations of the environment. Motivated by Google DeepMind's successful demonstrations of learning for games from Breakout to Go, we will propose different methods for autonomous driving using deep reinforcement learning. This is of particular interest as it is difficult to pose autonomous driving as a supervised learning problem as it has a strong interaction with the environment including other vehicles, pedestrians and roadworks. As this is a relatively new area of research for autonomous driving, we will formulate two main categories of algorithms: 1) Discrete actions category, and 2) Continuous actions category. For the discrete actions category, we will deal with Deep Q-Network Algorithm (DQN) while for the continuous actions category, we will deal with Deep Deterministic Actor Critic Algorithm (DDAC). In addition to that, We will also discover the performance of these two categories on an open source car simulator for Racing called (TORCS) which stands for The Open Racing car Simulator. Our simulation results demonstrate learning of autonomous maneuvering in a scenario of complex road curvatures and simple interaction with other vehicles. Finally, we explain the effect of some restricted conditions, put on the car during the learning phase, on the convergence time for finishing its learning phase.

研究の動機と目的

対話的な運転環境のため、自動運転に強化学習の利用を動機づける。
生のセンサ入力を手作り特徴量なしに運転操作へ写像するエンドツーエンドモデルを調査する。
車線維維持のための離散アクション (DQN) と連続アクション (DDAC) の DRL アプローチを比較する。
制限された終了条件が学習収束時間に与える影響を評価する。

提案手法

カメラ、LIDAR、レーダー入力のセンサ融合を用いて車線維維持を DRL 問題として定式化する。
離散アクションには Deep Q-Network (DQN)、連続アクションには Deep Deterministic Actor-Critic (DDAC) の2つの DRL パラダイムを適用する。
TORCS シミュレータ上で trackPos と車速を入力、操舵、ギア、加速、ブレーキを出力とするエンドツーエンドネットワークを訓練する。
DQN にはタイルコーディングを用いたアクションの離散化、DDAC にはアクター−クリティックによる方策勾配を適用する。
直線区間と曲線区間のトラックで性能を評価し、収束と軌道品質を比較する。
終了条件 (No termination, Out of Track, Stuck, Out of Track with Stuck) が収束時間に与える効果を検討する。

実験結果

リサーチクエスチョン

RQ1エンドツーエンドの DRL モデルは、生のセンサ入力から車線維維持を手作り特徴量なしで学習できるか？
RQ2離散 (DQN) と連続 (DDAC) のアクション形式は、学習効果と軌道の滑らかさという点でどう比較されるか？
RQ3DRL ベースの車線維維持における学習収束時間に対する、異なる終了条件の影響は何か？
RQ4DDAC は曲線区間で DQN よりも滑らかな制御とより良い性能を提供するか？

主な発見

DDAC は、タイルコーディングされた離散アクションを用いる DQN と比較して、曲線区部分でより滑らかな操舵とより良い性能を示す。
DDQN（タイルコーディングを用いた DQN）は一部の設定でより速く収束するが、操舵動作がより急激になることがある。
終了条件なしは、制限された終了条件の設定よりも収束が速くなるが、探索性が低下し局所的なミニマに陥るリスクがある。
終了条件を制限すると、エピソードリセットが頻繁になるため、収束時間が一般的に長くなる。
直線区間では両手法は同等に機能するが、曲線区間では DDAC が DQN を上回る。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。