QUICK REVIEW

[논문 리뷰] Long-term Planning by Short-term Prediction

|arXiv (Cornell University)|2016. 02. 04.

Adversarial Robustness in Machine Learning참고 문헌 26인용 수 39

한 줄 요약

이 논문은 이면적 예측 모델과 순환 신경망을 사용하여 자율 주행에서 장기 계획을 수행하는 이단계 접근법을 제안한다. 계획을 이면적 예측기와 순차 모델에 대한 지도 학습으로 프레임워크화함으로써, 비정상적이고 연속적이며 다중 에이전트 환경에서도 강력한 정책 학습이 가능해진다.

ABSTRACT

We consider planning problems, that often arise in autonomous driving applications, in which an agent should decide on immediate actions so as to optimize a long term objective. For example, when a car tries to merge in a roundabout it should decide on an immediate acceleration/braking command, while the long term effect of the command is the success/failure of the merge. Such problems are characterized by continuous state and action spaces, and by interaction with multiple agents, whose behavior can be adversarial. We argue that dual versions of the MDP framework (that depend on the value function and the $Q$ function) are problematic for autonomous driving applications due to the non Markovian of the natural state space representation, and due to the continuous state and action spaces. We propose to tackle the planning task by decomposing the problem into two phases: First, we apply supervised learning for predicting the near future based on the present. We require that the predictor will be differentiable with respect to the representation of the present. Second, we model a full trajectory of the agent using a recurrent neural network, where unexplained factors are modeled as (additive) input nodes. This allows us to solve the long-term planning problem using supervised learning techniques and direct optimization over the recurrent neural network. Our approach enables us to learn robust policies by incorporating adversarial elements to the environment.

연구 동기 및 목표

연속적인 상태 공간과 행동 공간을 가진 자율 주행에서 장기 계획을 해결하기 위해.
비마르코프 상태 표현으로 인해 전통적인 MDP 프레임워크의 한계를 초월하기 위해.
다중 에이전트이자 적대적인 환경에서 강력한 정책 학습을 가능하게 하기 위해.
복잡한 계획을 지도 학습과 순환 시퀀스 모델에 대한 직접 최적화로 분해하기 위해.

제안 방법

현재 관측치로부터 근접한 미래 상태를 예측하기 위한 이면적 예측기를 훈련시키기 위해.
예측기를 이면적 구성 요소로 사용하여 전체 에이전트 궤적을 모델링하는 순환 신경망(RNN)을 구성하기 위해.
불확실성과 적대적 행동을 모델링하기 위해, 설명되지 않은 요소를 RNN의 추가 입력 노드로 통합하기 위해.
지도 학습 목표를 사용하여 엔드 투 엔드 훈련을 통해 전체 시스템을 최적화하기 위해.
정책의 강건성을 향상시키기 위해 훈련 중 환경에 적대적 요소를 통합하기 위해.
시간에 따른 역전파를 활용하여 장기 수평 정책의 직접 최적화를 가능하게 하기 위해.

실험 결과

연구 질문

RQ1연속적이고 다중 에이전트 환경에서 단기 예측 모델을 사용하여 장기 계획을 수행할 수 있는가?
RQ2계획 작업에서 비마르코프 상태 표현을 효과적으로 다룰 수 있는가?
RQ3이면적 예측과 RNN 기반 궤적 모델링이 자율 주행에서 기존의 MDP 기반 계획보다 우수한 성능을 낼 수 있는가?
RQ4적대적 훈련이 계획 시스템의 정책 강건성에 얼마나 기여하는가?
RQ5이면적 RNN 아키텍처에 대한 엔드 투 엔드 최적화가 효과적인 장기 수평 제어를 달성할 수 있는가?

주요 결과

제안된 방법은 연속적인 상태 공간과 행동 공간을 가진 환경에서 장기 계획을 성공적으로 처리한다.
이면적 단기 예측은 장기 수평 최적화를 위한 효과적인 시간에 따른 역전파를 가능하게 한다.
훈련 중에 적대적 요소를 통합함으로써 정책의 강건성이 향상된다.
RNN 기반 궤적 모델은 추가 입력 노드를 통해 설명되지 않은 요소를 효과적으로 포착한다.
예측과 계획을 분리함으로써 이중 MDP 프레임워크의 한계를 피할 수 있다.
지도 학습 기법을 사용한 엔드 투 엔드 훈련은 안정적이고 일반화 가능한 정책을 도출한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.