QUICK REVIEW

[Paper Review] Autonomous UAV Navigation Using Reinforcement Learning

Huy Xuan Pham, Hung Manh La|arXiv (Cornell University)|Jan 16, 2018

Reinforcement Learning in Robotics18 references48 citations

TL;DR

The paper presents a PID-assisted Q-learning framework enabling a UAV to navigate unknown environments by learning a policy on a discretized state space, demonstrated in both simulation (5x5 grid) and real-world indoor flight with an AR Drone 2.0.

ABSTRACT

Unmanned aerial vehicles (UAV) are commonly used for missions in unknown environments, where an exact mathematical model of the environment may not be available. This paper provides a framework for using reinforcement learning to allow the UAV to navigate successfully in such environments. We conducted our simulation and real implementation to show how the UAVs can successfully learn to navigate through an unknown environment. Technical aspects regarding to applying reinforcement learning algorithm to a UAV system and UAV flight control were also addressed. This will enable continuing research using a UAV with learning capabilities in more important applications, such as wildfire monitoring, or search and rescue missions.

Motivation & Objective

Motivate navigation of UAVs in unknown environments without a predefined map or model.
Propose a reinforcement learning framework using Q-learning to learn navigation policies.
Showcase both simulated and real-world implementation of the approach on a quadrotor.
Demonstrate integration of a PID controller to translate learned actions into stable UAV motion.

Proposed method

Model the environment as a finite discretized state space of grid-centered circles with constant altitude.
Apply Q-learning with a tabular Q-table to learn state-action values and an epsilon-greedy policy for exploration/exploitation.
Define four discrete lateral actions (North, West, South, East) and a reward scheme where reaching the goal yields 100 and other states yield -1.
Incorporate a PID controller to drive the UAV from the current state to the next state and hover within a distance d (0.3 m in tuning results).
Use a simple integration of the learned policy with a low-level position controller to handle UAV nonlinear dynamics.

Experimental results

Research questions

RQ1Can a UAV learn to navigate from arbitrary starting positions to a predefined goal in an unknown environment using Q-learning?
RQ2Does integrating a PID controller improve stability and precision when executing learned actions on a real UAV?
RQ3How many episodes are required for convergence to an optimal path in simulation and in real hardware?
RQ4What is the performance (path length, convergence) of the learned policy in a discretized 2-D environment compared to an ideal shortest path?

Key findings

In simulation, the UAV learns an optimal path in 39 episodes with an 8-step shortest trajectory from start to goal.
In real hardware, the AR Drone 2.0 requires 38 episodes to discover the optimal 8-step path to the goal.
The UAV achieves hovering accuracy within a radius of 0.3 m from the target after tuning the PID gains.
The learning setup uses a reward of +100 for reaching the goal and -1 otherwise, guiding efficient navigation.
PID gains used in the real implementation were Kp=0.8, Kd=0.9, Ki=0 (to stabilize hover and reduce overshoot).
The last episode trajectory demonstrates the UAV reaching the goal via the shortest possible path in the final run.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.