QUICK REVIEW

[논문 리뷰] End-to-End Deep Reinforcement Learning for Lane Keeping Assist

Ahmad El Sallab, Mohammed Abdou|arXiv (Cornell University)|2016. 12. 13.

Reinforcement Learning in Robotics참고 문헌 21인용 수 142

한 줄 요약

본 논문은 TORCS에서 이산(DQN) 및 연속(DDAC) 액션 공간을 사용한 차선 유지에 대한 엔드투엔드 딥 강화학습을 탐구하고, 성능 비교와 학습 수렴에 미치는 종료(termination) 제약의 영향을 분석한다.

ABSTRACT

Reinforcement learning is considered to be a strong AI paradigm which can be used to teach machines through interaction with the environment and learning from their mistakes, but it has not yet been successfully used for automotive applications. There has recently been a revival of interest in the topic, however, driven by the ability of deep learning algorithms to learn good representations of the environment. Motivated by Google DeepMind's successful demonstrations of learning for games from Breakout to Go, we will propose different methods for autonomous driving using deep reinforcement learning. This is of particular interest as it is difficult to pose autonomous driving as a supervised learning problem as it has a strong interaction with the environment including other vehicles, pedestrians and roadworks. As this is a relatively new area of research for autonomous driving, we will formulate two main categories of algorithms: 1) Discrete actions category, and 2) Continuous actions category. For the discrete actions category, we will deal with Deep Q-Network Algorithm (DQN) while for the continuous actions category, we will deal with Deep Deterministic Actor Critic Algorithm (DDAC). In addition to that, We will also discover the performance of these two categories on an open source car simulator for Racing called (TORCS) which stands for The Open Racing car Simulator. Our simulation results demonstrate learning of autonomous maneuvering in a scenario of complex road curvatures and simple interaction with other vehicles. Finally, we explain the effect of some restricted conditions, put on the car during the learning phase, on the convergence time for finishing its learning phase.

연구 동기 및 목표

대화형 운전 환경으로 인해 자율주행에 대한 강화학습 사용의 필요성을 제시한다.
원시 센서 입력을 손으로 구성한 특징 없이 주행 행동으로 매핑하는 엔드투엔드 모델을 조사한다.
차선 유지에 대한 이산 액션(DQN)과 연속 액션(DDAC) DRL 접근법을 비교한다.
제한된 종료 조건이 학습 수렴 시간에 미치는 영향을 평가한다.

제안 방법

카메라, 라이다, 레이더 입력의 센서 융합을 통해 차선 유지를 DRL 문제로 공식화한다.
이산 액션에 대해 Deep Q-Network (DQN), 연속 액션에 대해 Deep Deterministic Actor-Critic (DDAC) 두 가지 DRL 패러다임을 적용한다.
TORCS 시뮬레이터에서 end-to-end 네트워크를 학습시키되 입력으로 trackPos와 자동차 속도를 사용하고 출력으로 조향, 기어, 가속, 브레이크를 사용한다.
DQN은 타일 코딩으로 액션을 이산화하고, DDAC는 액터-크리틱으로 정책 경사를 사용한다.
직선 및 곡선 트랙 구간에서 성능을 평가하여 수렴 및 궤적 품질을 비교한다.
종료 조건(No termination, Out of Track, Stuck, Out of Track with Stuck)이 수렴 시간에 미치는 영향을 조사한다.

실험 결과

연구 질문

RQ1원시 센서 입력에서 손으로 설계된 특징 없이 차선 유지를 학습하는 엔드투엔드 DRL 모델이 가능한가?
RQ2이산(DQN)과 연속(DDAC) 액션 형식은 학습 효율성과 궤적 매끄러움 측면에서 어떻게 비교되는가?
RQ3다양한 종료 조건이 DRL 기반 차선 유지의 학습 수렴 시간에 미치는 영향은 무엇인가?
RQ4곡선 구간에서 DDAC가 DQN에 비해 더 매끄러운 제어와 더 나은 성능을 제공하는가?

주요 결과

DDAC는 타일링된 이산 행동을 사용하는 DQN에 비해 곡선 트랙 구간에서 더 매끄러운 조향과 더 나은 성능을 보인다.
DDQN(타일 코딩이 적용된 DQN)은 일부 설정에서 더 빠르게 수렴하지만 제어가 더 급작스러울 수 있다.
종료 조건이 없는 경우 종료 제약이 있는 설정보다 수렴이 더 빠르지만 탐색 품질 저하 및 국소 최적에 빠질 위험이 있다.
종료 조건을 제한하면 에피소드 재설정이 더 잦아져 일반적으로 수렴 시간이 증가한다.
직선 트랙 구간에서는 두 방법이 비슷하게 작동하고, 곡선 구간에서는 DDAC가 DQN을 능가한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.