QUICK REVIEW

[Paper Review] Uncertainty-Aware Reinforcement Learning for Collision Avoidance

Gregory Kahn, Adam Villaflor|arXiv (Cornell University)|Feb 3, 2017

Reinforcement Learning in Robotics27 references226 citations

TL;DR

The paper presents an uncertainty-aware model-based RL method that predicts collision probability with neural networks and estimates uncertainty via bootstrapping and dropout, guiding a speed-dependent collision cost to enable safe exploration and effective learning for collision avoidance.

ABSTRACT

Reinforcement learning can enable complex, adaptive behavior to be learned automatically for autonomous robotic platforms. However, practical deployment of reinforcement learning methods must contend with the fact that the training process itself can be unsafe for the robot. In this paper, we consider the specific case of a mobile robot learning to navigate an a priori unknown environment while avoiding collisions. In order to learn collision avoidance, the robot must experience collisions at training time. However, high-speed collisions, even at training time, could damage the robot. A successful learning method must therefore proceed cautiously, experiencing only low-speed collisions until it gains confidence. To this end, we present an uncertainty-aware model-based learning algorithm that estimates the probability of collision together with a statistical estimate of uncertainty. By formulating an uncertainty-dependent cost function, we show that the algorithm naturally chooses to proceed cautiously in unfamiliar environments, and increases the velocity of the robot in settings where it has high confidence. Our predictive model is based on bootstrapped neural networks using dropout, allowing it to process raw sensory inputs from high-bandwidth sensors such as cameras. Our experimental evaluation demonstrates that our method effectively minimizes dangerous collisions at training time in an obstacle avoidance task for a simulated and real-world quadrotor, and a real-world RC car. Videos of the experiments can be found at https://sites.google.com/site/probcoll.

Motivation & Objective

Motivate safe learning for unknown environments where collisions may occur during training.
Develop an uncertainty-aware collision prediction model that operates on raw sensory inputs.
Incorporate a speed-dependent collision cost that leverages uncertainty to balance safety and task progress.
Demonstrate the approach on simulated and real-world robots (quadrotor and RC car) and compare to non-uncertainty baselines.

Proposed method

Use a model-based RL framework with receding-horizon MPC for navigation under uncertainty.
Predict collision probability with a neural network that outputs a Bernoulli parameter for P(coll | x, u, o) within the planning horizon.
Define a risk-averse collision probability P~(coll|x,u,o) that adds a scaled standard-deviation term to the pre-activation output.
Introduce a velocity-dependent collision cost C_coll = lambda_coll * ||vel||^2 to penalize high-speed near-collisions.
Train uncertainty estimates via bootstrapping and dropout to obtain E[f_theta] and Var[f_theta] for the risk-averse probability.
Iteratively collect trajectories with MPC, update the collision predictor with the new data, and repeat.

Experimental results

Research questions

RQ1Does incorporating uncertainty into the collision prediction improve safety during training without excessively harming task performance?
RQ2Can bootstrapping and dropout provide meaningful uncertainty estimates for high-dimensional sensory inputs in robotics?
RQ3How does a speed-dependent, uncertainty-aware collision cost affect exploration and learning efficiency in unknown environments?

Key findings

Uncertainty-aware planning reduces dangerous collisions during training compared to non-uncertainty baselines.
The approach trades off safety and final task performance through the tuning of the uncertainty term (lambda_std).
Compared to a constant-penalty baseline, the uncertainty-aware method better balances safety and progress rather than becoming uniformly conservative.
Real-world experiments with a quadrotor and an RC car demonstrate the method’s applicability to real sensors and tasks.
The method enables safe exploration by preferring low-speed, uncertain regions and higher-speed play where the model is confident.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.